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Preface 



This is the first of a planned two- volume sequence discussing mathematical 
aspects of population genetics theory, with an emphasis on the evolution- 
ary theory. This first volume is intended to discuss the more introductory 
aspects of the theory, with the second volume taking up more advanced 
and more recent aspects. Because of this, this first volume draws heavily 
on the first (1979) edition of this book, since the material in that edition 
may now be taken, to a large extent, as introductory to the contemporary 
theory. A second reason for drawing heavily on the 1979 edition is that 
many present-day students have asked for access to earlier material not 
now easily available. It is indeed remarkable how many results well-known 
in the 1970’s, and appearing in the literature of the time, are rediscovered 
in the modern literature. 

On the other hand, the subject has greatly expanded in scope and depth 
over the last twenty- five years. Many topics have been introduced during 
that time, or developed well beyond the level reached in the 1970’s. No 
doubt the most important of these is the development of the theory of 
molecular population genetics. Introductory aspects of this theory molec- 
ular population genetics are taken up in the later chapters of this volume, 
but a far more extensive description of the molecular theory will be given 
in Volume II. As one example of this, the theory behind currently active 
haplotype mapping projects will be discussed. To this extent, Volume II 
will be largely data-based. It will thus also form connections between evo- 
lutionary genetics and currently active areas of problems of human genetics 
and bioinformatics. On the other hand, developments of the evolutionary 
theory itself will be considered also, taking up evolutionary questions relat- 
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ing to many species rather than evolutionary behavior within one species. 
Other evolutionary topics such as the game theoretical approach to evo- 
lution, the analysis of gene-environment interactions, gene conversion and 
the extended development of the concept of inclusive fitness, will also be 
discussed in Volume II. 

Despite the emphasis on evolutionary population genetics in this volume, 
some material concerning human genetics, in particular those parts of the 
theory that are best discussed in evolutionary terms, has been included. 
One of the more pleasing developments over the last two decades has been 
a convergence of work in mathematical human genetics and mathemati- 
cal evolutionary genetics, areas which in 1979 had very little overlap. A 
manifestation of this convergence is the recent volume on mathematical 
population genetics and human evolution by Donnelly and Tavare (1997). 

The aim of the 1979 edition, namely to focus on the purely mathemat- 
ical aspects of population genetics theory, is retained in this book, even 
though it is recognized that this provides a narrow and distorted view on 
the subject of population genetics, and indeed of theoretical population ge- 
netics, as a whole. Thus, as in 1979, the book is intended as a complement 
to broader and more balanced accounts of population genetics generally. 
There are now many excellent books available devoted to this broader field, 
but these often do not attempt any depth of mathematical treatment, so 
that there is still a place for a narrowly focussed mathematical treatment. 

Apart from this, there are now several excellent books on specific aspects 
of population genetics theory. Of these it is appropriate to mention that by 
Lynch and Walsh (1998) on quantitative traits, a topic not covered in this 
volume, Epperson (2003) on geographical genetics and books by Chris- 
tiansen (2000) and Burger (2000) on multilocus theory. All these books 
carry the theory beyond the introductory level aimed at in this volume. 

One aim of the 1979 volume, not explicitly stated, was to induce 
mathematically-trained workers to enter the population genetics field. This 
aim is continued in this volume, and the mathematical beauty of many of 
the formulas in the molecular genetics chapters of this book should help in 
this endeavor. 

The molecular nature of current data implies that statistical methods 
are used far more frequently than was the case in 1979, with the molecular 
data being used to test various hypotheses about the evolutionary process. 
For the statistical analyses discussed I have adopted the standard conven- 
tion of employing upper-case letters to denote random variables and the 
corresponding lower-case letters to denote their observed values, except in 
cases where this seemed pedantic. This has also sometimes implied replac- 
ing Greek letters sometimes used in the literature for random variables by 
Roman letters. Probability distributions and density functions are written 
in lower case. 

Despite the fact that the earlier chapters of this book are based heavily 
on the 1979 edition, the discussion does sometimes differ substantially from 
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that in the 1979 edition, especially where the 1979 viewpoint now seems 
to be misguided or out of date. As one example, the discussion of the Fun- 
damental Theorem of Natural Selection is now quite different from that of 
the 1979 edition. The 1979 interpretation of the theorem, standard at the 
time, is now seen as incorrect and has been discarded. However I have no 
illusions about its ability to continue to exist as the textbook interpreta- 
tion, offered to students, especially since the correct interpretation requires 
greater mathematical depth than does the textbook version. 

Current theory in mathematical population genetics emphasizes retro- 
spective analyses rather then the prospective analyses making up much 
of the classical theory. In particular, theory surrounding the Kingman co- 
alesced process forms, quite appropriately, a significant part of current 
research. An introduction to this theory is given in Chapter 10, and a more 
extensive discussion will be given in Volume II. One of the aims of this 
book is to make connections between the prospective theory that much 
of the book considers with this retrospective theory. Apart from this, the 
classical prospective theory, considering properties of forward-going evolu- 
tionary processes, is still relevant to retrospective analyses. As one example 
of this, the theory surrounding the coalesced is often best developed by 
considering a process moving forward in time from a common ancestor to 
a sample of genes in the present generation, rather than by starting with 
the contemporary sample and moving backward in time to the common 
ancestor. 

Despite the natural current emphasis on the retrospective theory, there 
are several reasons for discussing the prospective theory in some detail in 
this book. The Darwinian theory of evolution continues to be attacked by 
various interest groups, and these attacks are sometimes helped by incorrect 
statements about the prospective evolutionary theory made sometimes even 
by biologists. The many extraordinary statements made by Lpvtrup (1987), 
for example, illustrate this. 

Even professionals in population genetics contribute to this problem. 
Arguments against evolution as a Darwinian process have been based the 
concept of the substitutional genetic load, which I believe has been delete- 
rious concept that should be dropped from into the theory. Substitutional 
load “theory”, as well as segregational load “theory”, is discussed, and I 
hope debunked, in Section 2.11. The “blind watchmaker” paradigm, peri- 
odically raised by outsiders to population genetics theory as refuting the 
Darwinian process and indeed evolution generally, is discussed, and I hope 
also debunked, in Section 1.6. On two more narrow points where those 
active in areas close to population genetics theory frequently abuse the 
theory, the correct as opposed to the textbook version of the Fundamental 
Theorem of Natural Selection, mentioned above, is described in Sections 
2.9 and 7.4.5. A discussion of the much- misunderstood expression “effective 
population size” , often incorrectly used in with reference to the history of 
the human population, is given in Section 3.7. 
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The recent and welcome infusion of population genetics theory into a 
variety of disciplines associated with the evolutionary process has not been 
without some problems. Perhaps the most important of these is that it 
has led to an uncritical use of some formulas from the theory without 
due assessment of whether the formulas are appropriate to the situation 
at hand. All formulas in population genetics theory derive some model of 
the evolutionary process, and in some cases this model can be no more 
than a very rough approximation to reality. For this reason a new section 
has been added, in this volume, discussing the modeling process and what 
may reasonably be concluded from the models discussed in the population 
genetics literature. 

On more technical matters, it has not always been possible to use the no- 
tation of various published papers whose results are described here, since in 
some cases the notation used in different papers for the same quantity dif- 
fer, and in other cases different authors use the same symbol for different 
quantities. As in the 1979 version of this book, the notation is not con- 
sistent, so that the symbol “x*” might variously mean the frequency of an 
allele in generation z, in subpopulation z, the frequency of the allele Ai, and 
so on. On a similar point, I have adopted American spelling but English 
punctuation conventions: The latter are more suited to a mathematical 
text. 

It is a pleasure to acknowledge the inspiration I have received from my 
long-time colleagues Bob Griffiths and Geoff Watterson. It is also a pleasure 
to thank Peter Donnelly, John Kingman and Simon Tavare for an equally 
close, albeit long-range, collegial association. I thank various colleagues for 
pointing out typographical errors in the 1979 edition of this book, and 
Alan Rogers for pointing out an error concerning exchangeable model cal- 
culations. Any errors observed in this volume will be gratefully received 
at wewens@sas.upenn.edu and an archive of these will be maintained at 
www.textbook-errata.org 



Philadelphia, Pennsylvania, USA 
October 2003 



Warren J. Ewens 
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Introduction to the First Edition 



Population genetics occupies a central place in a variety of important bi- 
ological and social undertakings. It has for many years been crucial to 
an understanding of evolutionary processes, of plant and animal breeding 
programs, and of various diseases of particular importance to man. While 
increased research in these areas naturally leads to a greater understand- 
ing of them, it also shows, particularly with the mathematical theory of 
population genetics, that previous arguments have sometimes been mis- 
leading, important points have been glossed over, and our knowledge of 
the genetic behavior of populations is not as firm as might previously have 
been thought. This observation is all the more important because much 
recent controversy on developments within or connected to population ge- 
netics ha s sometimes relied on now outdated population genetics theory. In 
this connection one might mention sociobiology, the effects of genetic ma- 
nipulation with recombinant DNA, nature- nurture and herit ability studies, 
and the knowledge of the detailed constitution of genetic material and the 
consequent possibility of its artificial creation. The importance of these de- 
velopments is immense, as is the need to base controversies on them on 
firm population genetic and other scientific knowledge. 

Population genetics embraces observational, experimental and theoret- 
ical components. While population genetics theory is in large measure 
quantitative, the complexities of Nature ensure that nonmat hematical rea- 
soning eventually outruns the purely mathematical aspects of the theory, 
which are necessarily based on simplified models of biological behavior. 
Nevertheless, the purely mathematical aspects of population genetics the- 
ory comprise a very large area of applied mathematical research, and the 
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aim of this book is to give an account of this purely mathematical theory. 
Thus this book is not about population genetics theory, still less about 
population genetics itself. Indeed, the selection of material that must nec- 
essarily be made is biased towards that with the richest mathematical 
content, and this sometimes implies that topics of greater importance to 
population genetics generally are treated at shorter length than their real 
importance warrants. Given the number of books on population genetics 
and population genetics theory, I believe there is a place for an account 
of the purely mathematical theory, even if biased in this way. Despite this 
broad aim, the first chapter of this book is largely historical and considers 
more general questions on population genetics. This is so since I believe 
such a background is necessary even for a consideration of the purely 
mathematical theory. 

The book has been aimed at the graduate or research level and should be 
supplemented by reading an introductory text. Perhaps the most useful for 
this purpose is C. C. Li’s excellent First Course in Population Genetics . As 
indicated above, collateral reading in population genetics theory generally is 
also necessary to place the topics treated in this book in proper perspective. 

What is the value of the mathematical side of population genetics the- 
ory? It may be argued that this merely makes quantitative arguments the 
general nature of which is already clear qualitatively. While in some mea- 
sure this is true, there are many questions where common-sense qualitative 
arguments have led to quite incorrect conclusions on the genetic behavior 
of populations. This is true even for rather simple aspects of the theory 
and, of course, is increasingly true for more complex aspects and also as- 
pects involving stochastic phenomena. This matter is discussed further in 
the concluding remarks of the book, to some extent in the light of examples 
of such questions treated in the preceding chapters. 

The mathematical theory contributes in various degrees to the con- 
troversial areas mentioned in the opening paragraph. The theory of the 
correlation between relatives for a metrical trait, outlined in Chapters 7 
and 8, is the key ingredient in herit ability studies and in nature-nurture 
allocations. The small but growing mathematical theory of altruistic traits 
concerns perhaps the central question of sociobiology. Detailed knowledge 
of the nature of genetic material has already led to considerable quan- 
titative theory, particularly in the nature of evolutionary processes: It is 
perhaps in this area that of those mentioned, mathematical population ge- 
netics theory will find its greatest application and from which, in turn, it 
will be most influenced so far as its nature and direction are concerned. 
The manipulation of genetic material now possible is perhaps, except in a 
negative sense, the area where mathematical theory is of least value. One 
population geneticist has claimed that the eventual goal of the study of 
evolution is to understand its processes quantitatively and thus be able to 
predict and control its course. The theory in this book, particularly that 
of Chapters 6 and 7, should indicate the difficulty of achieving the first 
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aim and the consequent great danger in an attempt to take control of evo- 
lutionary processes generally and in particular (as some enthusiasts would 
wish) of human evolution. The complexities of the genetic behavior of pop- 
ulations, as shown by the (still incomplete) mathematical theory, are far 
greater than our power to comprehend and control. 

Various points concerning the presentation of this book should be 
mentioned. Aiming to concentrate on the mathematical theory, I have em- 
phasized, particularly in Chapter 2, that such theory rests on models of 
biological reality which, no matter how simplistic, must be analyzed on their 
own without the injection of extraneous assumptions during the analysis. If 
such assumptions are brought in, and the assumptions injected contradict 
those implicit in the model, in principle any result, no matter how incor- 
rect, can arise. Of course the conclusions reached from a model must be 
treated with caution, depending on the reality of the initial assumptions 
made, but this is a different matter from interfering with the analysis of a 
model in mid-stream. Several incorrect conclusions in population genetics 
have arisen from such ad hoc interference. 

So far as terminology is concerned I have followed the standard usage of 
the subject, even when this is perhaps unsatisfactory. Two unfortunate ex- 
pressions, “gene frequency” (instead of the more logical “allele frequency”) 
and “additive genetic variance” (instead of, perhaps, “genic variance”) are 
entrenched in the literature, and I have used them here except on specific 
occasions when a more precise usage seemed necessary. The notation is 
not consistent throughout the book. Thus the symbol “x*” might variously 
mean the frequency of an allele in generation i, in subpopulation i, the fre- 
quency of the allele A*, and so on. Consistency would lead to cumbersome 
notation, and the context should always make clear, even if no explicit 
explanation is given, what any symbol stands for. 

I have cited fewer rather than more references during this book, concen- 
trating on those accounts that appear to be definitive, innovative, the most 
recent or in some other way important. 

This book has benefited greatly from the advice and criticism of many 
friends and colleagues, of whom I should mention Marc Feldman, Wal- 
ter Fitch, Bob Griffiths, Sam Karlin, Ray Littler, Tom Nagylaki, Eugene 
Seneta, Richard Spielman and Glenys Thomson. I must thank in partic- 
ular Frank Norman for many patient hours spent explaining to me the 
intricacies of mathematical diffusion theory, John Gillespie for his constant 
advice on biological, evolutionary and mathematical questions, and above 
all Geoff Watterson for his most careful and detailed reading of drafts of 
this book and for much discussion and guidance on the topics it considers. 
Naturally I am responsible for all errors and obscurities in the final version. 



Melbourne, Victoria, Australia, and 
Philadelphia, Pennsylvania, USA 
December 1976 to October 1978 



Warren J. Ewens 




1 

Historical Background 



1.1 Biometricians, Saltationists and Mendelians 

Population genetics theory was initially developed, in the 1920’s and 1930 ’s, 
by Fisher, Haldane and Wright, and current theory still bears the im- 
print of the work of these three great masters. Indeed, so fundamental was 
their contribution that even today, it is difficult to move forward from the 
paradigms that they introduced. Such a move forward is, however, neces- 
sary, especially because of the availability of data from the human genome 
project and other genome projects, and the need to analyze these data 
using population genetic theory methods. 

To make any such forward move, and to establish any new paradigm, will 
nevertheless require an understanding of the theory established by Fisher, 
Haldane and Wright, as well as an understanding of the historical context 
in which they found themselves. In short, their objective was to formulate 
an evolutionary paradigm based on the Mendelian hereditary mechanism. 
Perhaps the major difficulty in doing this arose from the divisions on evo- 
lutionary questions following the rediscovery of Mendelism in 1900. We 
therefore start by describing these divisions, which reinforced an already 
existing division among biologists about the nature of evolution. 

The Origin of Species was published in 1859. Apart from the controver- 
sies it brought about on a nonscientific level, it set biologists at odds as 
to various aspects of the theory. That evolution had occurred was not, on 
the whole, questioned. What was more controversial was the claim that 
the agency bringing about evolution was natural selection, and, among se- 
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lectionists, there was disagreement about the the nature of a selectively 
induced evolutionary changes. Darwin adhered to the “gradualist” point of 
view, that changes in the nature of organisms in populations were grad- 
ual and incremental. Some of those who, in general, were his strongest 
supporters, for example T. H. Huxley and Francis Galt on, were “saltation- 
ists”, believing that evolutionary changes most often occur in “jumps” of 
not inconsiderable magnitude. Two evolutionary schools of thought devel- 
oped from these two points of view. Although any attempt to describe in 
brief terms the long and complex controversies that followed is bound to be 
incomplete, it is nevertheless possible to trace in general terms the threads 
of the arguments followed by members of both schools. A more detailed 
account of these matters is given by Provine, (1971). 

Before doing so, it must be remembered that Mendel’s work, and hence 
the mechanism of heredity, was in effect unknown before 1900 and that in 
so far as a common view of heredity existed, it would have been that the 
characteristics of an individual are, or tend to be, a blending of the corre- 
sponding characteristics of his parents. It is, however, interesting to note 
that in a letter to Darwin in 1875 Galton came almost by pure reasoning to 
a proposition about the hereditary mechanism that was very close to the 
Mendelian one. Unfortunately his line of thought appears not to have been 
pursued: If it had been, the course of evolutionary thought during the next 
hundred years would have been very different. Details of Galt on’s letter, 
and comments on it, are given by Olby (1965). 

The blending hypothesis brought perhaps the most substantial scientific 
objection to Darwin’s theory. It is easy to see that with random mating, 
the variance in a population for any characteristic will, under the blending 
theory, decrease by a factor of one-half in each generation. Thus uniformity 
of characteristics would essentially be obtained after a few generations, so 
that eventually no variation would exist upon which natural selection could 
act. Since, of course, such uniformity is not observed, this argument is in- 
complete. But since variation of the degree observed could only occur by 
postulating further factors of strong effect which cause the characteristics 
of offspring to deviate from those of their parents, it cannot then be reason- 
ably argued that selectively favored parents produce offspring who closely 
resemble them and who are thus themselves selectively favored. This ar- 
gument was recognized by Darwin as a major obstacle to his theory of 
evolution through natural selection, and it is interesting to note that later 
versions of the Origin were, unfortunately, somewhat influenced by this 
argument. 

Galton’s role in the controversy between the gradualists and the salta- 
tionists was somewhat ambiguous. On the one hand he was himself a 
believer in the saltation theory, and this no doubt influenced him in ad- 
vancing in 1875 the hereditary theory referred to above. On the other hand, 
he pursued a close intellectual and personal relationship with Darwin and, 
through this, attempted to quantify the gradualist evolutionary process. 
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This led him to introduce the statistical concepts of correlation and regres- 
sion, which became the main tools of a group of scientists, later known 
as biometricians, who were one of the inheritors of the gradualist Darwin 
theory. This group’s mathematical research in quantitative evolution began 
in the 1890s under the leadership of W.F.R. Weldon and Karl Pearson. At 
the same time the saltationists gained further adherents, notably William 
Bateson, and the struggle between the two groups became more intense as 
the century drew to a close. 

The year 1900 saw the rediscovery of Mendelism. The particulate nature 
of this theory was of course appealing to the saltationists. Rather soon 
many biologists believed in a non-Darwinian process of evolution through 
mutational jumps - the view that “Mendelism had destroyed Darwinism” 
was not uncommon. On the other hand, the biometricians continued to 
believe in the Darwinian theory of gradualist evolution through natural 
selection and were thus, in the main, disinclined to believe in the Mendelian 
mechanism, or at least that this mechanism was of fundamental importance 
in evolution. 

It would be pointless to follow in detail the sometimes bitter acrimony 
that then followed. Even the inspired arguments of Yule (1902), based on a 
mathematical analysis of the Mendelian system, that Mendelism and Dar- 
winism could be reconciled, were largely ignored. And yet, paradoxically, 
Darwinism and Mendelism are not incompatible. Indeed, the former re- 
lies crucially on the latter, and further it would be difficult to conceive 
of a Mendelian system without some form of natural selection associated 
with it. To see why this should be so, it is now necessary to turn to the 
beginnings of the mathematical theory of population genetics. 



1.2 The Hardy- Weinberg Law 

We consider a random-mating monoecious population which is so large 
that genotype frequency changes may be treated as deterministic, and fo- 
cus attention on a given gene locus at which two alleles may occur, namely 
A\ and A 2 . Suppose that in any generation the proportions of the three 
genotypes A\A\, A 1 A 2 and A 2 A 2 are X , 2Y, and Z, respectively. Since 
random mating obtains, the frequency of matings of the type A 1 A 1 x A\A\ 
is X 2 , that of A\A\ x A\A 2 is 4XY, and so on. We now consider the out- 
comes of each of these matings. If the very small probability of mutation 
is ignored, and if there are no fitness differentials between genotypes, el- 
ementary Mendelian rules indicate that the outcome of an A\A\ x A\A\ 
mating must be A\A\ and that in an indefinitely large population, half the 
A\Ai x A\A 2 matings will produce A\A\ offspring, and the other half will 
produce A 1 A 2 offspring, with similar results for the remaining matings. 
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It follows that since A\A\ offspring can be obtained only from A\A\ x 
A\A\ matings (with overall frequency 1 for such matings), from A\A\ x 
A 1 A 2 matings (with overall frequency \ for such matings), and from 
A\A 2 x A\A 2 matings (with frequency \ for such matings), and since the 
frequencies of these matings are X 2 , 4 XY, 4 Y 2 , the frequency X / of A\A\ 
in the following generation is 

X' = X 2 + 1(4 XY) + 1(4 Y 2 ) = (X + Y) 2 . (1.1) 

Similar considerations give the frequencies 2 Y f of A 1 A 2 and Z f of A 2 A 2 as 
2 Y' = 1(4 XY) + 1(4 Y 2 ) + 2XZ + 1(4 YZ) = 2{X + Y)(Y + Z),(1.2) 
Z' - 1(4 Y 2 ) + 1(4 YZ) + Z 2 = (Y + Z) 2 . (1.3) 

The frequencies X n , 2 Y ft and Z" for the next generation are found by 
replacing X ', 2 Y r and Z l , by X ft , 2Y" and Z ff and X, 2Y and Z by 
2 Y’ and Z' in (1.1)— (1.3). Thus, for example, using (1.1) and (1.2), 

X" = {X f + Y f ) 2 
= {X + Y) 2 
= X', 

and similarly it is found that Y n — Y', Z n — Z f . Thus, the genotype fre- 
quencies established by the second generation are maintained in the third 
generation and consequently in all subsequent generations. Frequencies 
having this property can be characterized as those satisfying the relation 

(Y f ) 2 - X'Z'. (1.4) 

Clearly if this relation holds in the first generation, so that 

Y 2 =XZ , (1.5) 

then not only would there be no change in genotypic frequencies between 
the second and subsequent generations, but also these frequencies would 
be the same as those in the first generation. Populations for which (1.5) is 
true are said to have genotypic frequencies in Hardy- Weinberg form. 

We also observe that whereas there might be genotype frequency changes 
between generation 1 and generation 2, the frequency x = X + Y of the 
allele A\ does not change between these two generations. Nor of course 
does it change between any further generations. In accordance with com- 
mon practice, we shall often use the expression “gene frequency” , and an 
expression such as “the frequency of the gene A{\ rather than the “allele 
frequency” terminology employed above. 

Since X + 2Y + Z = 1, only two of the frequencies X, 2Y and Z are 
independent. If, further, (1.5) holds, only one frequency is independent. 
Examination of the recurrence relations (1.1)-(1.3) shows that the most 
convenient quantity for independent consideration is the frequency x of the 
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allele A\. These conclusions may be summarized in the form of a theorem: 

Theorem (Hardy-Weinberg). Under the assumptions stated, a population 
having genotypic frequencies X (of AiA\), 2 Y (of A 1 A 2 ) and Z (of A 2 A 2 ) 
achieves, after one generation of random mating, stable genotypic frequen- 
cies x 2 , 2x(l — x), (1 — x) 2 where x = X + Y and 1 — x = Y + Z. If the 
initial frequencies X, 2 Y, Z are already of the form x 2 , 2x(l — x), (1 — x) 2 , 
then these frequencies are stable for all generations. 

Numerical examples of this theorem were given by Castle (1903), who 
possibly (cf. Keeler (1968)) knew the theorem in full generality, by Yule 
(1906), and by Pearson (1904). The first published general proof was by 
Hardy (1908) and Weinberg (1908), and it is after these authors that the 
theorem has become known, normally as the “Hardy-Weinberg law” . 

Why is this rather simple theorem, or as it is more frequently called 
“law”, so important? Unfortunately it is important for two different rea- 
sons, one purely technical, and concentration on the technical reason has 
sometimes tended to obscure its truly basic value. The technical point is 
that if, as we may reasonably assume in a random-mating population, 
equation (1.5) is true, the mathematical behavior of the population can 
be examined in terms of the single frequency x rather than in terms of 
the pair (X, Y); this is certainly a considerable convenience, but it is not 
fundamentally important. The really important part of the theorem lies in 
the stability behavior. If no external forces act, there is no intrinsic ten- 
dency for any variation present in the population, that is, variation caused 
by the existence of the three different genotypes, to disappear. This shows 
immediately that the major earlier criticism of Darwinism, namely the fact 
that variation decreases rapidly under the blending theory, does not apply 
with Mendelian inheritance. It is clear directly from the Hardy-Weinberg 
Law that under a Mendelian system of inheritance, variation tends to be 
maintained. 

Of course, the action of selection itself often tends to destroy varia- 
tion; this qualification is of some importance and we shall return to this 
point later and will find that the rate of loss of variation in any realistic 
Mendelian scheme involving selection is far less than the rate under any 
realistic blending scheme. 

It is the “quantal” nature of the gene that leads to the stability behavior 
described by the Hardy-Weinberg law. It is thus interesting that the year 
in which the Mendelian theory was rediscovered, 1900, was the same year 
as the introduction of the quantum theory in physics. Both theories have 
been fundamental and crucial in their respective spheres. One can even 
claim that if there is intelligent life, that is, life that has evolved via natural 
selection, elsewhere in the universe, the heredity mechanism involved must 
be a quantal, maybe a Mendelian, one, since otherwise it is not clear how the 
variation necessary for evolution by natural selection can be maintained. 
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Thus the Hardy- Weinberg law shows that far from being incompatible, 
Darwinism and Mendelism are almost inseparable. It would be difficult to 
think of a hereditary process other than the quantal Mendelian scheme in 
which natural selection could act with such efficiency, while on the other 
hand fitness differentials between genotypes will normally lead to changes 
in gene frequencies and thus ultimately to evolution. We generalize the 
Hardy- Weinberg law later in this book to the case where more than two 
alleles are possible at the locus in question and also to the multilocus case. 
We shall also discuss extensions of it to non-random-mating populations. 
For the moment we shall be content with noting its historical significance. 

It was thus beginning to become clear by the end of the first decade of 
the 20th century that a reconciliation between Darwinism and Mendelism 
was not only possible but indeed inevitable. In 1911 this was already ap- 
parent to a young student of mathematics who read, during that year, 
a paper on “Heredity” to the Cambridge University Eugenics Society, in 
which he stressed the necessity for this reconciliation. Such a reconciliation 
would carry with it a requirement to interpret, on Mendelian principles, 
the large bodies of data assembled by the biometricians on the correla- 
tions between relatives for various physical characteristics. Several years 
later R. A. Fisher, the young student in question, wrote a landmark pa- 
per (Fisher, (1918)) in population genetics in which this reconciliation was 
achieved. 

Special cases of these correlations had been treated earlier by Pearson 
(1904) and Yule (1906), but Fisher’s (1918) work was the first one to con- 
sider the problem in a rather complete degree of generality. We therefore 
consider the approach he used, since several of the quantities which play 
a key role in his argument will appear subsequently to have considerable 
evolutionary importance. 



1.3 The Correlation Between Relatives 

Consider any character which is determined entirely by a locus A at which 
occur alleles A\ and A^ Suppose that all A\A\ individuals have measure- 
ment m ii for this character, that all A 1 A 2 individuals have measurement 
mi 2 , and that all A 2 A 2 individuals have measurement 77222 - For the moment 
we assume no environmental contributions: Once we know the genotype of 
any individual assume that we know the value of his measurement. 

Suppose that random mating obtains with respect to this character and 
that the frequencies of A 1 A 1 , A 1 A 2 and A 2 A 2 are in Hardy- Weinberg form 
x 2 . 2x(l — x) and (1 — x) 2 , respectively. Then the mean value fh of this 
measurement is given by 

fh = x 2 mn -b 2x(l — x)mi 2 + (1 — x) 2 rri 22 . 
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and the variance a 2 in the measurement is 
a 2 — x 2 (rrin - m) 2 + 2x(l - x)(mi 2 - m) 2 + (1 - x) 2 (m 22 ~ m) 2 . (1.6) 



Table 1.1. 



G 


M 


mu 


Son 

A1A2 

mi 2 


A2A2 

m 2 2 


A\A\ 


mn 


x 3 


x 2 (l — x) 


0 


Father AiA 2 


mi 2 


x 2 (l — x ) 


x(l - x) 


1 

to 


A2A2 


m 2 2 


0 


x(l - x) 2 


(1 — x) 3 



What is the covariance between father and son with respect to this mea- 
surement? Suppose first that the father is A\A\. Then the son will be A\A\ 
if the mother transmits an di gene to him, an event with probability x. 
Similarly the son will be A1A2 with probability 1 — x. The father himself 
will be A\A\ with probability x 2 . Continuing in this way it is possible to 
draw up a table of the probabilities of the various father-son combinations 
in genotype and hence in the character measured. Using G for genotype 
and M for measurement, we eventually find the values shown in Table 1.1. 

The covariance between the measurement for the father and that for 
the son, assuming no change in the frequency of A\ between the two 
generations, is thus 

x 3 mh + 2x 2 (1 — x)m\imi 2 + x(l - x)m\ 2 + 2x{l — x) 2 mi 2 rri 22 
+ (1 — x) 3 m,22 — m 2 

= x(l - x){xmn + (1 — 2x)m\2 — (1 - x)m 2 2 ) 2 . (1.7) 

The correlation between the two measurements, found by dividing the co- 
variance by the variance (since the variance for sons is the same as that for 
fathers), is then 

x(l — x){xmn -f (1 — 2x)m\2 — (1 — x)m 22} 2 /<x 2 • ( 1 . 8 ) 



It is useful to write this expression in a different form. If we define 
= 2x(l x){xmn + (1 - 2x)mi 2 - (1 — x)m 2 2 } 2 , 
ctq = x 2 (l - x) 2 { 2 mi 2 - mu - m 2 2 } 2 , 
the expression (1.8) is clearly 




( 1 . 10 ) 
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Furthermore, it is simply a matter of algebra to show that 

a 2 = a\+al„ ( 1 . 11 ) 

and in view of these relations it is of some interest to find interpretations 
for g\ and g 2 d . 

In order to find an interpretation for g\ we consider what changes are 
made in the measurement in question if we replace an A\ allele by an A 2 
allele in some individual. The effect of doing this will, in general, depend 
on whether the replacement is made in an A\A\ individual or an A\A 2 
individual. The change is m\ 2 — ran in the first case and 77122 — ^12 in 
the second, and these will not generally be equal. We thus try to find some 
expression for this effect which in some sense is as close as possible to these 
two values, using the concept of a weighted least-squares fit. 

Suppose we fit the measurements ran, ran and 77122 as closely as possible, 
in the sense of weighted least squares, by values of the form fh + 2aq, 
777 + aq -f a 2 , 77i + 2a 2 . Differentiation of the expression 5, defined by 

S — x 2 (mn — fh — 2oq) 2 + 2x(l — a?) (ran — fh — aq — a 2 ) 2 
+ (1 - x) 2 (m 22 - fh - 2 a 2 ) 2 

with respect to aq and a 2 with the derivative subsequently set to 0, gives 
eventually 

aq = x(mn — m) + (1 — x)(rai 2 — ra), 

a 2 = x(mi 2 — fh) + (1 — x)(m 22 — fh) 

as the best-fitting values. With this choice of aq and a 2 , the equation 

xaq -F (1 — x)a 2 — 0 (1.13) 

is automatically satisfied. Often the minimization procedure is carried sub- 
ject to the requirement that this equation holds, but since at the minimizing 
values this requirement is automatically satisfied, imposition of the re- 
quirement is not necessary. By contrast, when the above calculations are 
generalized to the case of many gene loci, a requirement of the form (1.13) 
will be needed. 

We define the average effect of substituting A 2 for A\ by 

a 2 - ai = x(mi 2 - ran) + (1 - x)(m 22 - rai 2 ). (1.14) 

When more than two alleles are involved we shall find it more convenient to 
adopt a slightly different usage and to call oq the average effect of Ai and 
a 2 the average effect of A 2 . The value (1.14) could have been found almost 
immediately by taking a weighted average of rai 2 — ran and 77722 —^12 • The 
present approach, while less direct, does on the other hand yield further 
information. The minimum value of the expression S is easily seen to be 

x 2 (l - x) 2 (2?77i2 - ran - ra 2 2 ) 2 , (1.15) 
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and the difference between this and a 2 , namely the sum of squares removed 
from a 2 by fitting the parameters ot\ and a 2 , is 

2x(l - x){xmn + (1 — 2 x)mi 2 — (1 — x)m 22 } 2 * (1.16) 



The expression (1.16) is identical to the quantity a\ defined in (1.9), 
while the residual sum of squares (1.15) is identical to the quantity g^ 
defined in (1.10). Because g\ can be derived in the way just outlined, it 
might reasonably be called the genic or allelic variance: It is that part of the 
total variance in the character which can be accounted for by the average 
effects of the alleles A\ and A 2 , used in an additive fashion. A frequently 
used name for o\ is the “additive genetic variance” in the character mea- 
sured, the word “genetic” meaning here “relating to genes” : This usage is 
perhaps unfortunate but because it is well established we follow it in this 
book. The residual variance a ^ is called the dominance variance. Except 
for the trivial cases x — 0, x — 1, it is zero only if 77112 — \{ m n + 77722 ), 
that is when there is no dominance in the measurement in question. 

We may then express the result (1.10) as follows: Under the conditions 
assumed, the correlation between father and son in the measurement con- 
sidered is half the ratio of the additive genetic variance to the total variance 
in the measurement. If we denote this ratio by p 2 , this result becomes 

corr(father, son) = Ip 2 . (1.17) 

This correlation is always nonnegative, and will only take the value zero 
when x = (77112 — m 22 )/( 2 mi 2 - mu - mi 2 ), a possibility that can arise 
only if mi 2 exceeds both mu and 77122 , or if 77112 is less than both mu 
and m 22 - We emphasize strongly the fact that this correlation has been 
found by basing all calculations on the Mendelian nature of the hereditary 
process. 

A table analogous to Table 1.1, considering in this case full sibs, shows 
that under the same assumptions made above, 

corr(full sibs) = \ p 2 + \ 5 2 , (1.18) 

where S 2 — g 2 d /g 2 . Similar considerations, using tables of Mendelian asso- 
ciations rather more complex than those in Table 1.1, show that under the 
same assumptions, 

corr (uncle- nephew) = 1 p 2 , (1.19) 

corr(double first cousins) = ~ p 2 + ^ 5 2 , (1.20) 

and so on. 

Having obtained these results, Fisher (1918) then considered more com- 
plex situations, in particular cases where more than two alleles are possible 
at each locus, where characters are determined by the alleles at many loci, 
and where assortative mating obtains. We shall not pursue the complexities 
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associated with assortative mating: They are touched on briefly in Chapter 
8. We also describe, in Chapter 7, a more efficient way of finding these 
correlations in the random-mating case. One generalization of these results 
is, however, straightforward. Fisher showed that for the one- locus multiple 
alleles case, the correlation formulae (1.17), (1.18), (1.19) and (1.20) remain 
unaltered provided that the additive and dominance variances are defined 
in the natural way through a generalization of the least-squares procedure 
just described. This is demonstrated in Section 2.4. 

The analysis of the correlation for characters determined by many loci 
is far more complex than that for characters determined by one locus, 
since interactive effects must then be taken into account. In the case of a 
character which is correlated with fitness it is very hard to determine how 
important these interactive effects might be. If, however, the character is 
not correlated with fitness, we may reasonably assume (see Section 7.6) 
that 



heq(A l A j B k Bi . . .) = freq (AiAj) x freq (B k Bi) x • • • (1.21) 

where ^ 2 ,... are the alleles possible at locus A partially determin- 
ing this character, F?i, are the alleles possible at a second locus 

B partially determining this character, and so on. Under random mating, 
equation (1.21) implies that the frequency of any chromosome, or gamete , 
can be written as the product of the frequencies of its constituent alleles. 
In this case the additive genetic variance r 2 can be found, as we show later 
(Section 7.3.3), by simply summing the single-locus additive variances at 
the various individual loci (that is, in an obvious notation, r 2 = ^cr^), 
with a similar result for the total variance ( a ; 2 = ^a 2 ): O ur notation here 
is informal and is different from Fisher’s. Thus assuming that (1.21) is true, 
the correlation in the character measured between father and son becomes 

r 2 

corr (father, son) = -, (1.22) 

which is the natural generalization of (1.17). Similar values arise for the 
other relationships although, as will be observed in Chapter 7, the formulas 
for these other correlations often depend on the recombination structure 
between the loci determining the character. It is quite possible that while 
these results are true only when the character in question is not correlated 
with fitness, these values yield a satisfactory approximation even when 
there is some such correlation. 

So far we have not taken any account of environmental variance. In prac- 
tice it is difficult to do this, because of the unknown but presumably high 
environmental correlation for father and son, for brother and brother, and 
so on. Ignoring the possibilities of such environmental correlation, Fisher 
used formulae such as those above, in conjunction with observed correla- 
tions, to estimate the various components of variance in any character. We 
do not pursue the details of this here, and more will be said on this matter in 




1.4. Evolution 



11 



Chapter 8. It is sufficient to note at this stage that at least under simplified 
assumptions, the genetic component of the correlation between relatives is 
given in terms of some function of the additive and the dominance vari- 
ances in the measurement of interest, and that the pattern of correlations 
predicted by the Mendelian mechanism agreed, for the data used by Fisher, 
reasonably well with those observed. As a result, Fisher had made a most 
significant beginning in reconciling biometry and Mendelism and for fusing 
these two into one discipline. From this point on population genetics, as the 
inheritor jointly of the Darwinian and the Mendelian theories, could start 
on a firm quantitative basis. Further, as we see in Section 1.4, the same 
variables used so effectively by Fisher in this reconciliation are, remarkably, 
central to the mathematical description of the evolutionary process. 



1.4 Evolution 

1.4.1 The Deterministic Theory 

We turn now to the evolutionary consequences of Mendelism. The twin cor- 
nerstones of the Darwinian theory of evolution are variation and natural 
selection. Variation is provided, under a Mendelian system, ultimately by 
mutation: In all natural populations mutation provides a continual source 
of genetic variation. Since the different genotypes created by mutation will 
often have different fitnesses, that is will differ in viability, mating success, 
and fertility, natural selection will occur. Our task is to quantify this pro- 
cess, and we now outline the work done during the 1920s and 1930s in this 
direction. Such a quantification amounts to a scientific description of the 
Darwinian theory in Mendelian terms. 

It is necessary, at least as a first step, to make a number of assumptions 
and approximations about the evolutionary process. Thus although muta- 
tion is essential for evolution, mutation rates are normally so small that 
for certain specific problems we may ignore mutational events. Further, al- 
though the fitness of an individual is determined in a complex way by his 
entire genetic make-up, and even then will often differ from one environ- 
ment to another, we start by assuming as a first approximation that this 
fitness depends on his genotype at a single locus, or at least can be found 
by “summing” single locus contributions to fitness. It is also difficult to 
cope with that component of fitness which relates to fertility, and almost 
always special assumptions are made about this. More complete discussions 
of these problems will be given later in this book. If fitness relates solely to 
viability then much of the complexity is removed, and for convenience we 
make this assumption, at least for the moment. 

Suppose then that the fitnesses and the frequencies of the three genotypes 
A 1 A 1 , A 1 A 2 , and A 2 A 2 at a certain locus “A” are as given below: 
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AiA x 


A 1 A 2 


A 2 A 2 


fitness 


wn 


W 12 


W 22 


frequency 


x 2 


2x(l — x) 


(1 - x) 



We have written the frequencies of these genotypes in the Hardy- Weinberg 
form appropriate to random mating. (Non-random-mating populations are 
discussed in Section 1.4.2.) Now Hardy-Weinberg frequencies apply only 
at the moment conception, since from that time on differential viabilities 
alter genotype frequencies from the Hardy-Weinberg form. For this reason 
we will always, in this book, count frequencies in the population at the 
moment of conception of each generation. 

Clearly the most interesting question to ask is: What is the behavior of 
the frequency x of the allele A\ under natural selection? Since we take the 
fundamental units of the microevolutionary process to be the replacement 
in a population of an “inferior” allele by a “superior” allele, the answer 
to this question is essential to an understanding of the microevolutionary 
process. 

This question wets first attacked in certain specific cases by Norton (see 
Punnett, 1917), and later in much greater detail by Haldane (1924, 1926, 
1927a, 1927b, 1930a, 1930b, 1932a) with a summary in Haldane (1932b). 
We consider here only the simplest of these cases. Before doing so, we 
observe that we are required to explain two seemingly contradictory phe- 
nomena. On the one hand we must explain the dynamic process of the 
substitution of one allele for another and, on the other hand, we must 
explain the observed existence of considerable, apparently stable, genetic 
polymorphism. 

The first concern is to find the frequency x f of A\ in the following gen- 
eration. By considering the fitnesses of each individual and all possible 
matings, we find that 



x(l - x){w u x + uq 2 ( 1 - 2x) - ^ 22(1 - z)} 
w u x 2 + 2wi2x(l — x) + ^22(1 - z ) 2 



(1.24) 



Clearly continued iteration of the recurrence relation (1.24) yields the suc- 
cessive values taken by the frequency of A\. Unfortunately simple explicit 
expressions for these frequencies are not always available, and resort must 
be made to approximation. 

Before discussing these approximations, we observe that x f depends on 
the ratios of the fitnesses W{j rather than the absolute values, so that x f is 
unchanged if we multiply each Wij by any convenient scaling constant. It 
is therefore possible to scale the Wij in any way convenient to the analysis 
at hand. Different scalings are more convenient for different purposes. We 
indicate below two alternative scalings of the fitness values w\j , and on 
different occasions either (1.25a), (1.25b), or (1.25c) will prove to be the 
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most useful. It should be emphasized that nothing is involved here other 
than convenience of notation. 



Fitness Values 



A\A\ 


A 1 A 2 


A 2 A 2 




w n 


W 12 


W 22 


(1.25a) 


1 + s 


1 -j- sh 


1 


(1.25b) 


1 - Si 


1 


1 - s 2 


(1.25c) 



We normally assume that except in extreme cases, perhaps involving lethal- 
ity, the fitness differentials s, sh, $1 and s 2 are small, perhaps of the order 
of 1%. In this case we ignore small-order terms in these parameters. 

Using the fitness scheme (1.25b), the recurrence relation (1.24) may be 
replaced, to a sufficiently close approximation, by 

x f — x = sx(l — x){x + h( 1 — 2x)}. (1.26) 

If we measure time in units of one generation, this equation may be 
approximated, in turn, by 

dx/dt = sx(l — x){x + h( 1 — 2x)}. (1-27) 

If the time required for the frequency of Ai to move from some value x\ to 
some other value X 2 is denoted by t(x i,x 2 ), then clearly 

X2 

t{x\ , x 2 ) — J (sx(l - x){x -f h(l - 2x)}) 1 dx. (1.28) 

XI 

Naturally this equation applies only in cases where, starting from xi, the 
frequency of A\ will eventually reach x 2 . 

While an explicit expression for t(xi,x 2 ) is possible, it is usually more 
convenient to use the expression (1.28) directly. Suppose first that s > 
sh > 0. Then it is clear from (1.27) that the frequency of A\ steadily 
increases towards unity. However, as this frequency approaches unity, the 
time required for even small changes in it will be large, due to the small 
term 1 — x in the denominator of the integrand in (1.28). This behavior is 
even more marked in the case h = 1 (A\ dominant to A 2 in fitness), for then 
the denominator in the integrand in (1.28) contains a multiplicative term 
(1 — x) 2 . This very slow rate of increase is due to the fact that, once x is 
close to unity, the frequency of A 2 A 2 , the genotype against which selection 
is operating, is extremely low. In the important particular case h = |, that 
is no dominance in fitness, (1.28) assumes the simple form 



X 2 




Xl 



(1.29) 
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Table 1.2. Generations spent in various frequency ranges 

Range 



h 


0.001-0.01 


0.01-0.1 


0. 1-0.5 


0.5-0.9 


0.9 0.99 


0.99-0.999 


1/2 


462 


480 


439 


439 


480 


462 


1 


232 


250 


309 


1,020 


9,240 


90,231 



It is possible to evaluate the times required for any nominated changes in 
the frequency of A\ from (1.28) and (1.29), and some representative values 
are given in Table 1.2. 

The times shown in Table 1.2 support the conclusions just given and 
show that while selection acts so that variation is ultimately destroyed, 
the times required are usually very long, and are much longer than those 
required under any blending theory of inheritance. We may therefore often 
expect to observe considerable genetic polymorphism in populations even 
though they are subject to directional natural selection. We shall find sev- 
eral uses later for this table and its various generalizations. The papers by 
Haldane referred to above provide values analogous to those in Table 1.2 in 
increasingly complex conditions, for example inbreeding, the case of differ- 
ent sets of fitnesses in the two sexes. Clearly this procedure quantifies, at 
least approximately, the unit microevolutionary process of the replacement 
of an “inferior” allele by a “superior” allele. 

It is clear that if s < sh < 0 a process parallel to the above, with A 2 
steadily replacing Ai, will occur. This process is a mirror image of the one 
just considered and needs no further comment. 

An entirely different behavior arises when the fitness W 12 of the het- 
erozygote exceeds the fitnesses of both of the homozygotes. This case is 
most conveniently treated by using the fitness parameters (1.25c) with 
s\ > 0, S 2 > 0. Here the recurrence relation (1.24) may be rewritten, 
to a sufficiently close approximation, as 

x' — x = x(l — x){s 2 — x(si + $2)}. (1.30) 

It is clear that there will be no change in the frequency x of A\ if x takes 
the particular value 

x = x * = £2 = (^22 - W12) fl 31) 

(*i + s 2 ) (w n +w 22 -2w 12 y } 

Further, if x < £*, then x < x f < x*, while if x > x*, then x* < x f < x. 
Thus x * is a point of stable equilibrium and, whatever its initial value, the 
frequency x of A\ will steadily approach x*. It is not difficult to see that 
if the heterozygote is the least fit genotype, so that s\ < 0, S 2 < 0, then 
x * is still an equilibrium point of the recurrence system (1.24), but in this 
case it is an unstable equilibrium and thus of little interest. In this case the 
frequency of A\ will steadily decrease to zero if its initial value is less than 
x * and will steadily increase to unity if initially greater than x*. 
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The above considerations taken together show that a necessary and suffi- 
cient condition that there exist a stable equilibrium of the frequency of A\ 
in the interval (0, 1) is that the heterozygote have a larger fitness than both 
homozygotes. This most important fact was established by Fisher (1922), 
and gives one possible explanation for the occurrence of stable allelic fre- 
quencies in a population. Later we shall find a number of other possible 
explanations: For the moment we simply observe that under the Mendelian 
system we can explain the occurrence of both dynamic substitutional pro- 
cesses and static equilibrium configurations. Thus, by the 1920s the first 
major steps were already being taken to explain in Mendelian terms, and 
also to quantify, what are perhaps the two major properties of biological 
populations, namely their capacity to evolve and their capacity to maintain 
static variation over long periods. 

We now consider the effect of mutation. Suppose that A\ mutates to A 2 
at a rate u and that A 2 mutates to Ai at rate v. Then it is easy to see that 
if there is no selection, 



x = x(l — u) -f v(l — x), 
and that a stable equilibrium is reached when 

* v 

x — x = . 

u + v 

Suppose now that both selection and mutation occur. We have in mind 
mainly the case where selective differences are of order 10~ 2 while mutation 
rates are of order 10 -5 or 10 -6 . Consider first the case where heterozygote 
selective advantage exists so that under selection only, a stable equilibrium 
of the form (1.31) exists. It is clear under this assumption that if selection 
and mutation are now both taken into account there will exist a new stable 
equilibrium differing only trivially from that given by (1.31). We thus do 
not consider this case any further. 

We next consider the case where A\A\ is the most fit genotype and A 2 A 2 
the least fit. Under the fitness scheme (1.25b), this assumption implies that 
s > sh > 0, and because mutation rates are assumed to by considerably 
smaller than fitness differentials, selective forces dominate mutation pres- 
sures for all but extreme frequencies of A \ . Because of this there will exist 
a stable equilibrium point for the frequency of Ai close to unity. More 
exactly we find, for this equilibrium point, the approximate formula 

x = :r* = l (1.34) 

s — sh 

for the equilibrium frequency of A\. If s > 0 and h = 1 (Ai dominant to 
A 2 ), the corresponding formula is 



(1.32) 

(1.33) 



( 1 . 35 ) 
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Parallel formulas apply when s < sh < 0: Here we find, at equilibrium, 



while when s < 0, h = 1, 



* 

x — x 



X — X* 



V 

= W 


(1.36) 


-- yj\v/s\ . 


(1.37) 



All these formulas were arrived at during the 1920s. They imply a second 
way in which genetic variation may be maintained in a population, that is 
by “mutation-selection balance”. However, the frequency of one or other 
allele will be very small for any of the equilibria (1.34)— (1.37), although the 
frequency of the less frequent allele is less small where dominance is com- 
plete. Thus, when s = 0.01, u = 10 -6 , the frequency of A 2 at equilibrium 
will be 0.01 when h — 1 (complete dominance) and 0.0002 when h = | (no 
dominance). 

We now consider further properties of mutation-selection equilibria such 
as (1.34), where the less frequent allele is quite rare and is maintained only 
by recurrent mutation from the favored allele. Under the fitness scheme 
(1.25b) the equilibrium mean fitness of the population would be 1 -h s if 
the mutation rate were zero, since in this case A\ would fix in the pop- 
ulation. The occurrence of mutation causes the mean fitness to decrease 
somewhat from this value. So long as h < 1, this decrease is found, to a 
close approximation, to be 2 u. For h = la somewhat different calculation, 
using (1.35), gives a decrease of u, and for values of h close to 1 a value 
closer to u than to 2 u is found. In other words, the population suffers a 
decrease in mean fitness proportional to the mutation rate, but not to fit- 
ness differentials. Haldane (1937), who first obtained this result, made the 
assertion that this situation has been reached in present-day populations 
by evolutionary modification of the mutation rate, so that a small current 
decrease in mean fitness is traded off against an increase in genetic plas- 
ticity in the population suitable for possible future evolution. We term the 
loss in mean fitness the “mutational load” and later consider this and more 
general forms of genetic load in more detail. 

We have observed earlier that the Mendelian system of heredity enables 
us to quantify, at least as a first approximation, the rate of allelic substitu- 
tion in an evolutionary process. Is it possible to arrive at general principles, 
derived from the Mendelian system, which quantify the two main features 
of an evolutionary process through Darwinian natural selection, namely 
the requirement of variation for evolution to occur and second, the “im- 
provement” brought about in a population through this evolution? In his 
Fundamental Theorem of Natural Selection (FTNS), Fisher (1930a, 1958) 
attempted to find such a principle. His presentation of this theorem was 
very obscure. The “conventional wisdom” version of this theorem, outlined 
below, is clearly not what he intended, but is nevertheless an interesting 
result. It is called here the “mean fitness increase theorem” (MFIT). 
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Consider a random-mating population where the fitness of any individual 
depends only on his genetic constitution at a single locus U A ” . Suppose that 
two alleles, A\ and A 2 , are possible at this locus and that the fitnesses of the 
three possible genotypes are as given in (1.25a). The population is assumed 
to reproduce in nonoverlapping generations, so that (1.24) is applicable. In 
any generation we may define the mean fitness w of the population in that 
generation by 

w = w n x 2 + 2w\2x( 1 - x) + ^22(1 - x) 2 , (1.38) 

where x is the frequency of A\ in that generation. The frequency x f of A\ 
in the following generation can be found from (1.24), and thus the mean 
fitness w f in that generation can be computed as 

w f = w n (x f ) 2 + 2w 12 x f (l - x') + 1^22(1 - x') 2 . (1.39) 

From this the change Aw = w f — w in mean fitness between these two 
generations is given exactly by 

Aw = 2x(l — x){w\ix + ^12(1 — 2x) — w 22 (l — x)} 2 (1-40) 

x {wnx 2 + (wi 2 + fan + \w 2 2)^(1 - x) + w 22 (l - x) 2 }w~ 2 . 

Clearly Aw is nonnegative, so we may conclude that natural selection acts 
so as to increase, or at worst maintain, the mean fitness of the population. 
This is the first part of the MFIT, and in the very restricted case considered 
it provides a quantification in genetic terms of the Darwinian concept that 
an “improvement” in the population has been brought about by the action 
of natural selection. 

We may also use (1.40) to quantify the second part of the Darwinian 
principle that variation, in our case genetic variation, is necessary for nat- 
ural selection to operate. If the Wij are all close to unity we may write, to 
a sufficiently close approximation, 

Aw « 2x(l — x){wnx + 11712(1 — 2x) — w 22 {\ — x)} 2 . (1.41) 

The definition in (1.9) for the additive genetic variance in fitness then shows 
immediately that 

Aw « a\. (1.42) 

This approximation quantifies in genetic terms the second major element 
of the Darwinian theory, and correspondingly of the MFIT, namely that 
the rate of increase of mean fitness is essentially equal to the additive 
component of the genetic variance in fitness. 

One might initially have thought that the total variance in fitness, namely 

a 2 = w 2 x x 2 + 2w 2 2 x(1 — x) + w 2 2 { 1 — x) 2 — ic 2 , (1-43) 

rather than the additive component of the variance, should be related to 
the increase in mean fitness. There are at least two arguments that show 
that this is not so. First, if the fitness values are of the form (1.25c) with 
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si, $2 > 0, and if the population is at the equilibrium point (1.31), then the 
total variance in fitness will be positive and yet, because the population is 
at equilibrium, there will be no increase in mean fitness from one generation 
to the next. Second, and related to the first argument, the additive com- 
ponent of the genetic variance is that portion explained by “genes within 
genotypes” when these are freed, as far as is possible, from deviations due 
to dominance. Since, in the model we consider, changes in gene frequencies 
are the fundamental components of evolution, the rate of increase of mean 
fitness can be expected to be related to that component of the total ge- 
netic variance which is accounted for by the alleles themselves, that is the 
additive genetic, or genic, variance. 

The MFIT is not the FTNS. The Fundamental Theorem in its full 
generality is deeper, more general and more complex than the MFIT. In 
particular, it applies in cases when mating is not at random and also when 
the fitness of any individual depends on his entire genomic make-up, not 
simply his genetic make-up at one single locus. In both these cases the 
MFIT breaks down. Because the FTNS is so general, we defer its exposi- 
tion and proof to Chapter 2 (for the one locus case ) and Chapter 7 (for 
the many locus case), where the machinery needed for it is developed. 

As stated above, the MFIT does not hold as a theorem under non-random 
mating and when fitnesses depends on the genes at many loci. The fact 
that the MFIT does not hold when mating is not at random, implying 
non-Hardy-Weinberg frequencies in the parental generation, is immediately 
apparent. Suppose that the fitness of A\A\ is 1, that the fitness of A 1 A 2 is 
0.6 and the fitness of A 2 A 2 is 1, and that some form of non-random mating 
has occurred so that in some parental generation, half the individuals in 
the population are A\A\ and half are A 2 A 2 . Then the mean fitness is 1, 
and if mating is such that heterozygotes appear in the daughter generation, 
the mean fitness will decrease. Thus in this case the MFIT breaks down, 
as a mathematical theorem. 

It is less immediately apparent that decreases in mean fitness can arise 
even under random mating if the fitness of any individual depends on the 
alleles at several loci. This case required a more complex analysis than that 
considered here, and is deferred to Chapter 7. 

I.4..2 Non- Random- Mating Populations 

Essentially all the theory above, and indeed the theory in most of this book, 
assumes a random-mating population. This reflects in part the theory in 
the literature as it now exists, and also a focus on animal populations. 
However the human population does not mate at random, and it is thus 
relevant to consider, at least briefly, some of the consequences to the theory 
when a population does not mate at random. 

There are many forms of non-random mating, and here we consider one 
which brings out some of the salient features of this form of mating. Suppose 
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that the frequencies of the three genotypes in some parental generation are 
as given in (1.44). 



A\A\ A 1 A 2 A 2 A 2 

(1.44) 

frequency Xu 2X 12 X 22 

Suppose now that an individual mates specifically with an individual of 
the same genotype with probability /, and mates at random, possibly with 
an individual of the same genotype, with probability 1 — /. By considering 
all possible matings, their frequencies and their genetic outputs, it is found 
that the genotype frequencies in the daughter generation are given as in 
(1.45). 



A\A\ A\A 2 A 2 A 2 

f{X n + ±X 12 ) fX\2 f{\X l2 + X 22 ) (1-45) 

+ i l ~f)x 2 +(l-/)x(l-x) +( 1-x) 2 . 



Here x = Xu + X\ 2 is the frequency of A\ in the parental generation. 

The daughter generation values can be used for several purposes. First, 
they show that the frequency of the allele A\ in daughter generation is the 
same as that in the parental generation. Thus this frequency remains con- 
stant throughout the evolutionary process. Second, they can be updated 
to find the various genotype frequencies in the following generation. Fi- 
nally, by equating parental and daughter generation genotype frequencies 
we find the asymptotic (t — > 00 ) values. This limiting process shows that 
the asymptotic heterozygote frequency H is given by 



4(1 — f)x(l — x) 

H = — w — ' 

while the two asymptotic homozygote frequencies are 

(A^-.x-^H, (A 2 A 2 ):l-x-±H. 



(1.46) 



(1.47) 



All these genotype frequencies are positive, and their values confirm that 
the asymptotic frequencies of A\ and A 2 are at the original parental val- 
ues x and 1 — x respectively. Thus in the sense that allelic frequencies 
are maintained, a central conclusion deriving from of the Hardy- Weinberg 
law concerning the preservation of genetic variation also holds for this 
non-random-mating population. One generation of random mating would 
immediately restore Hardy-Weinberg genotype frequencies. On the other 
hand, the variation that is maintained is to some extent cryptic, since 
the heterozygote frequency is less than that applying for a random-mating 
population with the same allelic frequencies. 

Because of the preservation of variation, even though this is to some 
extent cryptic variation, in the non-random-mating case, we pay com- 
paratively little attention to this case in this book, certainly less than is 
appropriate. 
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1-4-3 The Stochastic Theory 

In this section we consider an aspect of evolutionary behavior which was 
considered at some length by Fisher, Haldane and Wright, namely the effect 
of the finite size of the population considered. This finiteness implies that 
changes in gene frequencies must be viewed as being part of a stochastic, 
rather than a deterministic, process. It is necessary, in order to arrive at a 
theoretical estimate of the importance of the stochastic factor, to set up a 
stochastic model which reasonably describes the behavior of a population in 
the stochastic case. Perhaps more than in any other part of the theory the 
choice of a model here is somewhat arbitrary, and we do not pretend that 
Nature necessarily follows at all closely the models we construct. (Modeling 
in population genetics is discussed further in Section 1.6.) Although they 
did not use the terminology of Markov chain theory, the methods used by 
Fisher and Wright are in fact those of this theory and its close relative, 
diffusion theory. A brief summary of parts of Markov chain theory is given 
in Section 2.12. We anticipate here some of the results given in that section, 
and present the conclusions of Fisher and Wright in the terminology of 
Markov chains. 

We consider, as the simplest possible case, a diploid population of fixed 
size N. Suppose that the individuals in this population are monoecious, that 
no selective difference exist between the two alleles A\ and A 2 possible at 
a certain locus “A,” and that there is no mutation. There are 2N genes in 
the population in any generation, and it is sufficient to center our attention 
on the number X of A\ genes. Clearly in any generation X takes one or 
other of the values 0, 1, . . . , 27V, and we denote the value assumed by X in 
generation t by X(t). 

We must now assume some specific model which describes the way in 
which the genes in generation t + 1 are derived from the genes in generation 
t. Clearly many reasonable models are possible and, for different purposes, 
different models might be preferable. We discuss various possible models 
later in this book: Naturally, biological reality should be the main criterion 
in our choice of model, but we shall also consider mathematical convenience 
in this choice. The model which we consider assumes that the genes in 
generation t + 1 are derived by sampling with replacement from the genes 
of generation t. This means that the number X(t + 1) is a binomial random 
variable with index 2N and parameter X (t)/2N. More explicitly, given that 
X(t) = i, the probability pij that X(t + 1) = j is assumed to be given by 

Pij = ( 2 ^) {i/2NY{l - (i/2N)} 2N ~i, i,j = 0, 1, 2, , 2N. (1.48) 

While the model in this form was not written down explicitly by Fisher 
and Wright, it is clear that it was known to Fisher (1921), (1930a) and 
Wright (1931), who explicitly gave several formulas deriving from it. While 
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the model apparently originated with Fisher, we follow common practice 
of honoring both authors by calling it the Wright-Fisher model. 

More precisely, we shall refer to the model (1.48) as the “simple” Wright- 
Fisher model, since it does not incorporate selection, mutation, population 
subdivision, two sexes or any other complicating feature. The purpose of 
introducing it is to allow an initial examination of the effects of stochastic 
variation in gene frequencies, without any further complicating features 
being involved. More complicated models, such as (1.58), (1.66), (3.68) and 
(3.72) that introduce factors such as selection, mutation and allow more 
than two alleles, but which share the binomial sampling characteristic of 
(1.48), will all be referred to generically as “Wright-Fisher” models. 

We emphasize that all of these models are no more than crude approxi- 
mations to biological reality. This fact is expanded upon in Sections 1.6 and 
3.7. Later in this book we will introduce other models having properties 
different from those of Wright-Fisher models. 

In the form of (1.48), it is clear that X(-) is a Markovian random variable 
with transition matrix P = {pij}, so that in principle the entire probability 
behavior of X(-) can be arrived at through knowledge of P and the initial 
value X(0) of X. In practice, unfortunately, the matrix P does not lend 
itself readily to simple explicit answers to many of the questions we would 
like to ask, and we shall be forced, later, to consider alternative approaches 
to these questions. 

On the other hand, (1.48) does enable us to make some comments more 
of less immediately. Perhaps the most important is that whatever the value 
-A(O), eventually X(-) will take either the value 0 or 2N , and once this hap- 
pens there will be no further change in the value of X(-). Genetically this 
corresponds, of course, to the fact that since the model (1.48) does not 
allow mutation, once the population is purely A 2 A 2 or purely A\A\, no 
variation exists, and no further evolution is possible at this locus. It was 
therefore natural for both Fisher and Wright to find, assuming the model of 
(1.48), the probability of eventual fixation of A\ rather than A 2 , and per- 
haps more important, to attempt to find how much time might be expected 
to pass before fixation of one or other allele occurs. It is easy enough to 
see that the answer to the first question is X(0)/2N. This conclusion may 
be arrived at by a variety of methods, the one most appropriate to Markov 
chain theory being that the solution 7 Xj = j/(2N) satisfies (2.141) and its 
boundary conditions. Setting j = X(0) leads to the required solution. A 
second way of arriving at the value X(0)/2N is to note that X(-)/2 N is a 
martingale, that is satisfies the “invariant expectation” formula 

E{X{t + l)/2 N | X{t)} = X(t)/2N, (1.49) 

and then use either martingale theory or informal arguments to arrive at 
the desired value. A third approach, more informal and yet from a genetical 
point of view perhaps more useful, is to observe that eventually every gene 
in the population is descended from one unique gene in generation zero. 
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The probability that such a gene is A\ is simply the initial fraction of A\ 
genes, namely X(0)/2N, and this must also be the fixation probability of 
A,. 

It is far more difficult to assess the properties of the (random) time 
until fixation occurs. The most obvious quantity to evaluate is the mean 
time £{X(0)} taken until A(-) reaches 0 or 2N , starting from X(0). As it 
happens, no simple explicit formula for this mean time exists, although, as 
we see later, some simple approximations are available. Fisher and Wright, 
no doubt noting this difficulty, paid comparatively little attention to the 
mean fixation time, concentrating on an approach centering around the 
leading nonunit eigenvalue of P. It follows immediately from (1.48) that if 
we put x(t) = X(t)/2N , 

E(x(t + 1) { 1 — x(t + 1)} | x(t)) = {1 — ( 2N)~ 1 }x(t){l — x(t)}, (1.50) 

so that the expected value of the heterozygosity measure 2x(-){l — #(•)} 
decreases by a factor of 1 — (2N)~ 1 each generation. It follows immedi- 
ately that 1 — (2N)~ 1 is an eigenvalue of the matrix P, and the theory 
in Appendix A shows that it is the leading nonunit eigenvalue. We write 
the right and left eigenvectors corresponding to this eigenvalue as r = 
(ro, n, 7 * 2 , . . . , T2n), and = (A), 4, • • • , h n) respectively. It follows 

from (1.50) that r' is proportional to the vector 

{0, 2N - 1, 2(2 AT - 2), 3(2N - 3), . . . , 2AT - 1, 0}. (1.51) 

Unfortunately, no such simple formula exists for the left eigenvector l. If 
we suppose that £ and r are normalized by the requirements 

2N-1 2 N 

J2h = l, Y, ** r * = l, (1-52) 

k — 1 k = 0 

then (2.140) shows that 

Pij (t) = Piob{X(t)=j\X(0)=i} 

= rilj{ 1 — (2A/') _1 } t + o{ 1 — (2AT) -1 }* for t large. (1.53) 

Equations (1.50) and (1.53) jointly provide much interesting information. 
It is clear that especially in a large population, the mean heterozygosity 
of the population decreases extremely slowly with time as a result of the 
sampling drift implicit in the process under consideration. We conclude 
that although genetic variation must ultimately be lost under the model 
(1.48), the loss is usually very slow. This slow rate of loss may be thought 
of as a stochastic analogue of the “variation-preserving” property of infinite 
genetic populations shown by the Hardy- Weinberg law. It is appropriate 
to quote Fisher (1958, p. 95) on this conclusion: “No result could bring 
out more forcibly the contrast between the conservation of the variance 
in particulate inheritance, and its dissipation in inheritance confirming to 
the blending theory” . We shall generalize this conclusion later, taking into 
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account not only mutation but also complications brought about through 
variation in the population size through geographical factors, through the 
existence of two sexes, and so on. 

What can be said about the distribution of X(t) for large £, given X(t) ^ 
0, 2N ? Both Fisher (1958, pp. 90-96) and Wright (1931, pp. 111-116) paid 
considerable attention to this question. It is clear from (1.52) and (1.53) 
that 

lim Prob {X{t) = j \ X(t) + 0, 27V} = L, j = 1, 2 , . . . , 2N - 1. (1.54) 

t— >oo 

Furthermore, both Fisher (1958, p. 94) and Wright (1931, p. 113) show that 
£j « (2AT — 1) _1 , so that the asymptotic distribution under consideration is 
essentially uniform. Although both Fisher and Wright devoted considerable 
attention to this distribution, and indeed to very accurate expressions for 
it, especially for very small and very large values of j, it is of far less 
importance than would appear from the extensive discussion that they 
devoted to it. The reason for this is that the complete spectral expression 
for of which (1.53) gives the leading terms and which was unknown 

to Fisher and Wright, shows that by the time this distribution becomes 
relevant it is almost certain that fixation or loss of A\ will already have 
occurred. This observation, due to Kimura (1955a), will be taken up in 
more detail later. For the moment we use it to justify our passing over 
further discussion of this asymptotic distribution. 

A more important question, also taken up by Fisher (1958, p. 96) and 
Wright (1931, p. 116), although in a rather different form than that used 
later in this book, is the following. Suppose that in an otherwise purely 
A 2 A 2 population, a single new mutant A\ gene arises. No further mutation 
occurs, so from this point on the model (1.48) applies. How much time will 
pass before the mutant is lost (probability 1 — (2N)~ l ) or fixed (probability 
(2 N)~ l )l The mean number of generations t\ for one or other of these 
events may be written in the form 

27V- 1 

ii = ( L55 ) 

J=1 

where is the mean number of generations that the number of A\ genes 
takes the value j before reaching either 0 or 2 N. Both Fisher and Wright 
found that 



j — 1,2, , 2N — 1, (1.56) 

so that using (1.55), 

ii*2(log(2 W-l)+ 7), (1.57) 

where 7 is Euler’s constant 0.5772 . . . This expression is the C~ l of Wright 
(1931, p. 117); Fisher (1958) found the extremely accurate expression 
2(log(2AT — 1) + 7) + 0.200645 + 0(iV _1 ), which for large N is correct 
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to at least 5 decimal places, as well as expressions for that are more 
accurate, for small j, than those provided by (1.56). 

We derive the result (1.57) later (see (5.23)), using methods other than 
those employed by Fisher and Wright. 

There is an ergodic equivalent to the expressions in (1.55) and (1.56) 
which is perhaps of more interest than (1.55) and (1.56) themselves, and 
which is indeed the route by which Fisher arrived at these formulas. Con- 
sider a sequence of independent loci, each initially purely “A 2 A 2 ”, and at 
which a unique mutation A\ occurs in generation k in the kth member of 
the sequence. We may then ask how many such loci will be segregating for 
A 1 and A 2 after a long time has passed, and at how many of these loci 
will there be exactly j Mi” genes. It is clear that the mean values of these 
quantities are t\ and respectively, and this gives us some idea, at least 
insofar as the model (1.48) is realistic, of how much genetic variation we 
may expect to see in any population at a given time. The question of the 
amount, and the nature, of the genetic variation that can be expected in a 
population at any given time will be taken up later at much greater length. 

Wright (1931, p. 129) and Fisher (1958, p. 99) also considered the mod- 
ifications to these results when selective differences exist. Again we do not 
pursue the details of their calculations since we arrive later at their results 
by other methods. Suppose we assume fitness values of the form (1.25b). 
Then it is reasonable to replace (1.48) by the model 

Pa = ( 2 ^) 1 - m) 2N ~\ i,j = 0, 1, 2, , 2 N (1.58) 

where now 

= (1 + s)i 2 + (1 + sh)i(2N - i) 

Vt (1 + s)i 2 + 2(1 + sh)i(2N -i) + (2 N - i) 2 ' ^ ; 

We may again ask what values i\ and iij assume. This problem was at- 
tacked by Fisher and Wright only in the case h — We shall show later, 
for general values of ft, that 

1 

2 f ip(y)dy 

t hj « j , (1.60) 

2Nx(l — x)ip(x) f ij){y)dy 
0 

where x = j /2 N and 

'ip(x) = exp{— 2 ahx + (2ft — l)o;a: 2 }, (1.61) 

with a defined by a = 2 Ns. When there is no selection the value of a 
is 0, so that ip(x) — 1, and the expression in (1.60) reduces to that in 
(1.56), as we would wish. For the zero dominance case, where ft = the 
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approximation (1.55) reduces to 



_ 2(1 - exp{— a(l - x)}) 

1,J 2Nx{\ — x){l — exp(— a)} ’ 



( 1 . 62 ) 



agreeing with the value given by Fisher (his Aan is our a). For h / | the 
right-hand side in (1.60) cannot be evaluated explicitly, although clearly 
numerical approximation is possible. In all cases t\ — 

Both Fisher and Wright used the approximation (1.62) to find the proba- 
bility that a new mutant Ai will eventually become fixed in the population. 
Their method, which is quite different from the one we consider later, is as 
follows. Suppose in (1.62) we put x = 1 — S and consider small values of 5. 
Then (1.62) reduces in effect to 



2a 

2N{1 — exp(— a)} ’ 



( 1 . 63 ) 



which, as a — > 0, approaches 2/2 N. We now argue that since the probability 
of fixation of A\ for the neutral case ( a = 0) is known to be (2 AT) -1 , the 
probability of fixation in the case we are considering must be given by 

Prob(- 4 , = 1 — exp(— 2Wg) <L64) 



This is identical to the value given by Fisher (1958, p. 100) and Wright 
(1931, p. 133) upon setting our s equal to Fisher’s 2a and Wright’s 2s. 

Equation (1.64) influenced Fisher considerably. He was accustomed to 
think in terms of very large populations; thus he gave a table of values of 
t\ (see (1.55)) for values of N ranging from 10 6 to 10 12 and wrote later of 
populations of size of a thousand million as though they were typical. The 
ratio of the right-hand side in (1.64) to the value (2N)~ X applying for the 
case s = 0 is 



a / (l - exp(-ce)) , (1.65) 

and for the values a — — 4, 0 and 4 this ratio takes the values 0.08, 1 and 4. 
Thus, as noted by Fisher, increasing a from —4 to +4 increases the proba- 
bility of fixation of A\ by a factor of about 50. Thus in a population of size 
10 9 , only a minute range of selective differences around zero lead effectively 
to the same fixation probability as for complete selective equivalence. As 
an alternative way of noting this, an increase in s from 0 to 10 -6 increases 
the probability of fixation of Ai by a factor of 2,000 in a population of this 
size. These considerations strongly influenced Fisher in arriving at the view 
that selective differentials are of paramount importance in determining the 
genetic evolutionary behavior of populations, and that the randomness in 
the behavior of gene frequencies brought about by the finite nature of the 
population size in no way seriously undermines the Darwinian theory. 

There were two reasons why Wright was less influenced than was Fisher 
by formulas such as (1.65). First, he was accustomed to think in terms 
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of population sizes far smaller than 10 9 . His view of the optimal circum- 
stance under which evolution occurs, which we consider in more detail later, 
was rather different from Fisher’s, and involves random changes in gene 
frequencies in populations of comparatively small size as one significant 
component. Second, Wright considered comparatively short-term behav- 
ior whereas Fisher was accustomed to focus on very long-term behavior, 
for which comparatively short-term stochastic effects are eventually dom- 
inated by the long-term effects of selective differences. We return to this 
comparison of emphases later. 

A further problem of an essentially stochastic nature, considered almost 
exclusively by Wright (1930, pp. 133-134), concerns the stationary distribu- 
tion of the frequency of A\ when, in addition to the changes in frequencies 
brought about by selection and the random changes due to the finite na- 
ture of the population size, we allow mutation from A\ to A 2 (at rate u) 
and from A 2 to A\ (at rate v). In this case we may reasonably replace the 
transition probability (1.58) by 

Pij = (^) (v* y (! - Vi ) 2N ~ J , (1-66) 

where rj* is given by 

V* = (! -u)T)i + (1 -Vi)v, (1-67) 

r]i being defined by (1.59). If we put x = X(-)/2N, Wright showed in effect 
that the stationary distribution of x is of the form 

f(x) = const x 4Nv ~ 1 (l — x) 4Nu ~ 1 exp{2 ahx — (2 h — l)ax 2 }, (1.68) 

the constant being chosen so that Jq 1 f(x)dx — 1. 

When the heterozygote is at a selective advantage it is perhaps better to 
use the fitness parameters (1.25c) to arrive at the equivalent formula 

f(x) — const x 4iVv-1 (l - x) 4Nu ~ l exp{2a 2 x ~ (oq + ol 2 )x 2 }, (1.69) 

where a* = 2Nsi. In these formulas the relative effects of the population 
size, the selective coefficient and the mutation rate on the form of the 
distribution can be ascertained. Thus if mutation rates are sufficiently small 
so that (4 Nu < 1, 4 Nv < 1), some accumulation of probability occurs near 
x = 0 and near x — 1. This does not, however, necessarily mean that most 
of the mass of the probability distribution is near these points, and it is 
quite possible that the most likely values for x are determined more by 
selection than by mutation. 

As an example we consider the case N = \ x 10 5 , u = v = 5 x 10 -6 and, 
in the notation (1.25c), si = s 2 = 2 x 10 -3 . Inserting these values in (1.69) 
we arrive at the stationary distribution 

f(x) = Cx~ 1 / 2 ( 1 — x) _1 ^ 2 exp200x(l — x) 



(1.70) 
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for the frequency of A \ . The constant C is again chosen so that f* f(x)dx = 
1. To compare the effects of mutation and selection we compare the integral 
of the density functions over two small sub- intervals, one near 0 and the 
other near Thus we find, for example, that the probability that the 
frequency x of A\ is less than 0.0001 or greater than 0.9999 is approximately 

o.oooi 

2 C J x~ 1/2 dxtt0MC, (1.71) 

0 

while the probability that x is between 0.4999 and 0.5001 is approximately 

0.0004 C exp (50). (1.72) 

This is about 10 22 times larger than the value given in (1.71), and indicates 
that in this case the selective forces have a far greater influence on the likely 
values that x will assume than have the mutation rates. Although this 
example has a high degree of symmetry implicit in it, a parallel result will 
hold for asymmetric cases where the selective coefficients and the mutation 
rates are of the same order of magnitude as those in this example. Thus 
if u — 5 x 10 -6 , v = 10 -5 , si = 10 — 3 , S 2 = 2 x 10 -3 , selection is again 
far more important than mutation in determining the likely values of x. 
In general this conclusion will hold so long as the selective differentials are 
at least 100 times larger than the mutation rate. If in the above example 
si = 52 = 2 x 10 -4 , the probability of a value of x less than 0.0001 or 
greater than 0.9999 is of the same order of magnitude as the probability of 
a value between 0.4999 and 0.5001, while if si = S 2 = 2 x 10“ 5 , the former 
probability is rather larger than the latter. 

As a particularly important application of stochastic process theory, 
Fisher (1922), Haldane (1927b) and Wright (1931) all considered the spe- 
cific problem of the probability of survival of a single new favorable mutant 
allele. This probability has already been computed, for the case of selection 
without dominance, in (1.64). A rather different approach, using the theory 
of branching processes, may be used to approximate this probability, and 
it is some interest to outline the elements of this method. To do this we 
follow the treatment of Fisher (1930a). 

We consider a population with nonoverlapping generations, the various 
generations existing at a sequence of time points 0, 1, 2, 3, . . ., and suppose 
X n genes (or “individuals”) at time n. Each of these X n individuals gives 
rise to a number of offspring individuals and then dies. At time n + 1 each 
of these offspring in turn produces offspring, and so on. We suppose a given 
fixed distribution for the number of offspring for each individual and that 
the numbers of offspring to those individuals alive at any given time are 
independent. The values Xq, Xi, X 2 , . . ., form a Markov chain: In this 
branching process Markov chain model no fixed upper limit can be set to 
the values of the X*. We suppose that each individual leaves % offspring 
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with probability p*, and introduce the generating function 

p(z) =p 0 +piz + p 2 z 2 + ■■■ , (1.73) 

where z is a dummy variable. Clearly, if the mean and variance of the 
distribution {pi} are denoted p and cr 2 , we have 

p(l) = !> P'( l)=/b p"(l) = a 2 - p + p 2 . (1.74) 

We assume that the branching process starts with one individual in gen- 
eration 0 (that is, Xq = 1). Then the generating function of the number 
of individuals in generation 1 is p(z), and for generation 2 can be found in 
the following way. We have 

Prob{X 2 = *} = Prob{X 2 = i\X 1 =j}x ProbjXi = j}. (1.75) 

3 

Given that X\ — j, the probability that X 2 = i is evidently the coefficient 
of z* in {p(z)} j . Thus from (1.75) 

Prob{X 2 = i} — coeff z 1 in p(z)YPj 

3 

— coeff in p(p(z)). 

It follows that the generating function of the distribution of X 2 is p(p(z)) 
and, more generally, the generating function of the distribution of X n is 
the nth functional iterate p n (z), defined by 

Pn(z) = p(Pn- i(z)) = p n _i(p(z)). (1.76) 

Fisher (1930a) was interested in three quantities. The first is the probability 
7 r n that X n = 0, the second is the limiting value of 7T n as n -» 00 , and 
the third the conditional probability distribution of X n for n large, given 
X n 7 ^ 0. By setting z — 0 in (1.71) we see immediately that 7 r n satisfies the 
functional relation 



TTn +1 = p{n n ), n — 1,2,3,..., (1-77) 

with 7To = 0. By letting n oo in (1.77), we see that limiting value 7r of 
7 r n satisfies 

7T=p(7r), (1.78) 

and it is not hard to show that the required value 7 r is the smallest positive 
root of (1.78). Putting 7r = 1 — 5 (6 small), a Taylor series expansion in 
(1.78) yields 

i-*wi-V(i) + £«y (1) , 

and if p = 1 + e (e small, positive), (1-74) and (1.79) show that 

2e 



(1.79) 
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We shall defer consideration of the conditional distribution of X n (X n ^ 0) 
for a moment and examine it only in a case of particular genetic interest. 

We turn now to the application of these results in genetics, following 
the approach used by Fisher (1930a). Consider the case of a nonrecessive 
A\ mutant gene introduced into a previously purely A 2 A 2 population. Ho- 
mozygotes A\A\ will not usually appear until the number of A\ genes is 
comparatively large (of order y/N, where N is the population size) and by 
this time the fate of the new mutant, that is whether it will die out or 
not, is usually in effect settled. Thus although it is clear that the assump- 
tions made in the theory of branching processes are not exactly met for 
populations of fixed size, it should be possible using this theory to obtain 
rather close approximations to several quantities of evolutionary interest, 
increasing in accuracy as TV oo. If this is done, the expression “survival 
of a new mutant” is then taken to mean the increase in the frequency of a 
mutant to a point where the probability of loss of the mutant by accidents 
of sampling in anything other than a very long time may safely be ignored. 
We have in mind in particular either the fixation of the mutant in the pop- 
ulation or the attainment of a quasi-stable equilibrium point determined, 
for example, by heterozygote selective advantage. 

We are mainly interested in establishing results for populations of stable 
size, and by convention we do this by using the fitness scheme (1.25b), 
where the values are now taken as absolute fitnesses. We thus identify the 
unit fitness of the prevailing genotype A 2 A 2 with stable population size. 
Assuming the model (1.58), we may reasonably suppose each mutant A\ 
gene produces a random number of A\ “offspring” according to the binomial 
distribution with index 2N and parameter (1 + sh)/2N. To a sufficient 
approximation we may replace this distribution by a Poisson distribution 
with parameter 1 + sh. In this case the generating function (1.73) becomes 

p(z) — exp {(z - 1)(1 + sh)}, (1.81) 

and the approximation (1.80) yields 

S^2sh. (1.82) 

For h — \ this agrees with the value (1.64) found by diffusion methods, 
at least for values of N sufficiently large so that exp (—Ns) may safely be 
ignored. This confirms the view that the branching process approximation 
is most accurate for large N. Equation (1.77) becomes 

7r n+ i = exp{(7r n - 1)(1 + sh)}, (1.83) 

an equation which may be iterated numerically to provide values of 7r n for 
any value of n. This was done by Fisher for s = 0 and for sh = 0.01 and 
the numerical values found confirm the approximation (1.82), which Fisher 
did not use explicitly. 

The case s = 0 is of particular interest. Here 

7r n +i = exp(7r n - 1). 



(1.84) 
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Since 7r n -* 1 as n — > oo it is interesting to attempt an approximate solution 
of (1.84) in the form 

7T n ~ 1 — cn ~ 1 . 

Insertion of this trial value into (1.84) gives c — 2 and hence 

7r„«l — 2 rT x . (1.85) 

This value was given by Fisher from inspection of the numerical iteration 

(1.84). 

We turn finally to the conditional distribution of X n , given X n ^ 0 for the 
case 5 = 0. Here we merely outline Fisher’s conclusion. It is clear that the 
unconditional mean of X n is unity, and hence from (1.85) the conditional 
mean of X n (given X n > 0) is approximately \n. It is thus reasonable to 
consider the normalized variable y n = X n /n which, given X n > 0, we may 
hope will possess a limiting distribution as n — > oo. By using generating 
function techniques, Fisher showed that the limiting (n —> oc) distribution 
of y n is 

f{y) — 2exp(-2y), y > 0, (1.86) 

so that in particular 

Prob(X n > kn) — Prob(y > k) ~ exp(— 2 k). (1*87) 

In the case sh > 0, Fisher proved that the conditional distribution of 
y = X n /(1 + s/i) n , given X n > 0, is asymptotically 

f(y) = 2shexp(-2shy), y > 0. (1.88) 

Thus 

Prob(X„ > X(1 + sh) n \ X n = 0) = Prob {y > X \ y > 0) « exp(-2 Xsh). 

(1.89) 

It should be emphasized that these conclusions, while they are arrived at 
by considering indefinitely large values of n, nevertheless apply only if the 
numbers of mutants involved is far less than the population size N, for it is 
only for such values that branching process approximations are legitimate. 
This is true particularly of equation (1.89). 

What evolutionary conclusions can be drawn from these calculations? 
The first, and perhaps most important, is that while the survival probabil- 
ity (1.82) is small, it is nevertheless positive. Thus while the lines initiated 
by most favorable mutations will die out, and usually rather rapidly, the 
eventual survival of a favorable mutant is certain if mutation is recurrent. 
Thus taking the case s = 0.01, h = a mutation rate of 10 -6 in a pop- 
ulation of size 10 8 will produce 200 mutations per generation, and the 
probability that none of the mutational lines initiated in just the first gen- 
eration survive is only (0.99) 200 « 0.14. In a larger population, or with a 
larger mutation rate, this probability is diminished even further. It follows 
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that in large populations a favorable new mutant will begin to establish 
itself rather soon after mutation to it commences. We may then use equa- 
tions such as (1.28) to consider how long various degrees of establishment 
will require. On the other hand, in small populations, and even more im- 
portant with unique mutational events, the small individual probability of 
survival of the line initiated by a single mutant is a factor which must be 
incorporated into evolutionary considerations. 

A second observation concerns the origin and potential selective advan- 
tage of a mutant which has spread to large numbers in a population. We 
may take as a numerical example a population of size 10 7 containing 10 5 A\ 
genes. If these genes enjoy no selective advantage and arose from a single 
mutational event, (1.87) shows that the mutation most likely occurred at 
least 10 5 generations in the past. However, if the mutation to the allele in 
question is recurrent, the average time required for the current frequency 
10 5 is rather less, while if the mutant possesses a selective advantage its 
present frequency can be explained by a comparatively rapid recent increase 
in numbers. 

A final comment concerns populations whose sizes are not stationary. 
Any mutant in a population of uniformly increasing size will have its sur- 
vival probability increased. We consider as an example a new mutant having 
selective advantage 0.01 arising in a population of 10 4 . Suppose now that 
the population doubles in size for eight generations and stabilizes at a size 
of 256 x 10 4 . If the doubling in population size were to continue indefinitely, 
the new mutant would have a probability 7r of loss satisfying the equation 

7r = exp(2.02(7r — 1)), 

the solution of which is n = 0.1978. When doubling stops after eight gener- 
ations the probability of loss of the mutant is rather greater than this, being 
approximately 0.3. A converse comment applies for mutants in decreasing 
populations. Thus populations that are increasing in size should exhibit 
some variety of forms compared to populations that have a stable size or 
are decreasing in size. The variety will perhaps diminish once stability of 
population size is reached, since some unfavorable mutants which increased 
in numbers because of the increase in population size will now die out. In 
practice, of course, any protracted increase in size must occur at a rather 
low rate, and thus this argument applies most to mutations whose selective 
advantage or disadvantage is rather small. 



1.5 Evolved Genetic Phenomena 

In the previous section we have asked the question: Assuming the Mendelian 
genetic scheme and given the numerical values of various genetic parame- 
ters, for example mutation rates, the degree of dominance, what conclusions 
can be drawn about evolutionary processes? Fisher, Wright and Haldane 
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also asked a converse question, namely: Given that evolution has occurred, 
what purely genetic characteristics can be explained as a result of this evo- 
lution? Perhaps the most interesting such questions concern mutation rates, 
dominance, linkage intensities, and the sex ratio, while on a broader level 
the existence of sexual dimorphism, a Mendelian phenomenon, and even the 
pervasiveness of the Mendelian scheme itself, can be considered. Here we 
limit attention to brief comments on the first four topics, again restricting 
attention to the work done in the pioneering period we are considering. 

We have alluded already to the question of observed rates of mutation 
and the possibility that these are the results of evolutionary processes 
whereby the contrasting requirements of a low mutation rate, to preserve 
such favorable gene complexes as have been built up, and a high muta- 
tion rate, so that a large number of potentially or actually favorable new 
mutations will arise, are optimally balanced. It is difficult to quantify this 
argument, and no real attempt to do so was made during the time we are 
considering. Of course one must avoid the assumption that all presently 
observed genetic phenomena are at some sense at optimal values: it is cer- 
tainly possible to argue that current mutation rates are partly the result 
of extrinsic factors having nothing to do with evolution, or at least that 
while they no doubt vary from locus to locus and time to time and are ca- 
pable of some evolutionary modification, they are not presently at optimal 
evolutionary values. 

We turn next to the question of dominance. Fisher argued that dom- 
inance is the outcome of an evolutionary process through an induced 
selection of modifier genes at loci other than the primary one under con- 
sideration. He was strongly influenced in this view by the observation that 
it is normally the prevailing wild- type allele that is dominant, so that in 
the course of its becoming the prevalent type it presumably acquired the 
dominance property. We consider the details of this argument in Section 
6.5, and for the moment we only introduce the elements of the analysis. 

We consider two alleles A\ and A<i at a locus and assume the fitness 
scheme of the form (1.25b). If A\ mutates to A 2 at rate u we may suppose 
that the frequency of A\ is at the mutation-selection equilibrium point 
(1.34). Suppose now that at a locus M, at which the allele M 2 was pre- 
viously fixed, a mutant allele M\ arises with the effect that those A 1 A 2 
individuals carrying the allele M\ are altered in phenotypic expression to- 
wards that of the prevailing homozygote A\A\. We assume that fitness is 
determined by the phenotype so that the fitness scheme takes the following 
form: 





A\A\ 


A 1 A 2 


A 2 A 2 


Mj Mi 


1 


1 


1 — s 


Mi M 2 


1 


1 — sk 


1 — s 


M 2 M 2 


1 


1 — sh 


1 — s 
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Here s > 0 and 0 < k < h < 1. Clearly M\ is at an induced selective 
advantage to M 2 and will steadily increase in frequency to unity, bringing 
about dominance of A\ over A 2 . 

Several qualifications should be made about this argument. Perhaps the 
most important is that we have ignored any possible selective differences 
between Mi and M 2 which might arise for reasons quite separate from 
dominance modification at the A locus. Clearly the rate of change in the 
frequency of Mi though dominance modification is very small, since the 
selective superiority of Mi over M 2 through this agency arises only in the 
comparatively rare heterozygotes A\A 2 . It would require only a minute se- 
lective advantage of M 2 over Mi for other reasons to overcome this. Wright 
(1929a,b) was strongly influenced by this argument in forming his doubts 
about Fisher’s theory. Wright’s view on evolution, which we shall exam- 
ine more closely in Section 1.7, was centered around the assumption of an 
almost universal interactive effects of genes, so that the fate of any allele 
is determined by the net selective force acting on it, the direction of this 
force being normally determined by factors more important than domi- 
nance modification. Fisher, on the other hand, believed that the selective 
advantage due to dominance modification would ultimately be effective. 
We examine his argument in more detail in Section 6.5. 

Wright (1934) put forward the more purely physiological view that 
dominance is a natural pristine characteristic, rather than an evolved char- 
acteristic, of an allele. We do not go into detail of this argument here. It 
is sufficient to note that the theory recognizes the role of genes in con- 
trolling the production of enzymes, which act as catalysts in physiological 
processes, and that one gene may well produce sufficient enzyme for a cer- 
tain process so that no further effect is produced by a second gene. The 
reader is encouraged to read Fisher’s and Wright’s original papers on this 
matter, since in no other way than by reading them can the flavor of their 
long dispute on this matter, and its bearing on their respective evolutionary 
viewpoints, be appreciated. 

We consider next the question of linkage modification. The circumstances 
under which Fisher envisioned the evolution of close linkage between two 
loci (see for example Fisher (1958, p. 116)) occur when, at two loci A and 
H, the allele A\ is favored in the presence of B\ while A 2 is favored in the 
presence of B 2 . This will imply that the double heterozygote A\B\j A 2 B 2 
will occur more frequently than the double heterozygote A\B 2 /A 2 B\ and 
that a recombination between A and B loci will break down the former in 
greater absolute numbers than they are formed by recombination from the 
latter. Thus a higher recombination fraction will lead to a greater break- 
down of the “favored” gametes A\Bi and A 2 B 2 and hence to a decrease 
in the mean fitness of the population. It is convenient to give an example 
of the form of fitness schemes envisioned for such a process. One set of 
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fitnesses having the desired characteristics is of the form 

B\B\ B\ B 2 

A\A\ 1 1 — a 

A 1 A 2 1 — CL 1 

A 2 A 2 1 — 4a 1 — a 

This fitness scheme was introduced by Wright (1952) and considered in 
some detail by him for purposes other than that of present interest. 

An analysis of the evolutionary behavior of cases where, as in (1.91), the 
fitness of any individual depends on his genetic constitution at more than 
one locus is more complicated than the single-locus analysis considered so 
far, and is discussed in some detail in Chapter 6. For the moment we simply 
present the result of this analysis as it applies to the model (1.91) and also, 
below, as it applies to the model (1.92). 

The evolutionary behavior of a population for which the fitnesses are 
as given in (1.91) is not as simple as one might initially expect. It can be 
shown that for any value of the recombination fraction R between A and B 
loci (0 < R < I), there is an equilibrium point of gamete frequencies with 
all frequencies positive. However, this equilibrium is never stable. In other 
words, a fitness scheme of the form (1.91) cannot maintain a stable genetic 
polymorphism at either A or B locus and is thus of no use for considering 
the argument in question. 

Another fitness scheme with fitnesses of the general desired form is 



B 2 B 2 
1 — 4a 
1 — a 
1 



(1.91) 





B\Bi 


B\B 2 


b 2 b 2 


AxA x 


1 


1 — a 


1-2 a 


to 


1 — a 


1 4- 2a 


1 — a 


A 2 A 2 


1-2 a 


1 — a 


1 



(1.92) 



This fitness scheme leads to an equilibrium point with 
freq(AiHi) = freq(A 2 H 2 ) = c* 
freq(AiH 2 ) = freq(A 2 £?i) = \ - c* 
where c* is the unique solution in (^, |) of the equation 
12ac 3 — 8ac 2 + ac + i?(l + 2a) (c — = 0. 

It is easy to verify geometrically that c* increases as R decreases and that 
c* — y 7} as R — y 0. 

We turn next to the equilibrium value of the mean fitness w, considered 
as a function of c* . This is 

w = 1 - 4ac* + 12a(c*) 2 , (1.95) 



(1.93) 



(1.94) 



and since this is an increasing function of c* for | < c* < we conclude 
that the equilibrium value of w is smallest when R is large and largest when 
R is small. 
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We show later that the equilibrium (1.95) is stable, at least for small 
R , and thus we have shown that for small R at least, the stable equilib- 
rium mean fitness decreases as the recombination fraction between the loci 
increases. 

Fisher now argued that if “different strains” have different recombination 
fractions, the strain with the smallest value will, because of its higher mean 
fitness, tend to replace the others, so that tight linkage will have evolved 
in the population. This argument, involving the new concept of interpopu- 
lational selection, will be considered further, with arguments not involving 
this form of selection, in Section 6.5. 

The final characteristic we consider is the sex ratio. Fisher’s argument 
on this is curiously non-genetic in the sense that it could well have been 
made in pre-Mendelian times. The argument involves the introduction of 
the concept of “parental expenditure”, which does not initially appear to 
be a necessary, or indeed the most obviously appropriate, vehicle for ex- 
plaining the sex ratio. The argument is that each offspring receives, while 
young, a certain expenditure on the part of its parents. Consider now a 
cohort of such offspring about to embark on reproduction. The males in 
this cohort will supply exactly half the ancestry of the descendants of this 
cohort, as will of course the females. Suppose now that the total parental 
expenditure on behalf of males is less than that of females. Then parents 
having the tendency to produce male offspring in excess will, for the same 
expenditure, tend to contribute disproportionately to the ancestry of sub- 
sequent generations. Since the same argument in reverse would apply if 
the expenditure on females were less, selection will tend to change the sex 
ratio to the point where an equal expenditure is made on female and male 
offspring. If now males suffer a heavier pre- adult mortality, then as com- 
pared to females more of this expenditure will take place for males who die 
early and do not participate in reproduction. It follows that the sex-ratio 
of males to females should exceed unity at birth but be lower than unity 
at the age of reproduction. 

This argument leads to an evolutionary adjustment of the sex ratio. 
Whether the various assumptions implicit in it are valid is uncertain, and 
what appears to be a superior verbal argument, the consequences of which 
is that the sex ratio should be unity at the time of conception, is given in 
Crow and Kimura (1970, pp. 288-289). We examine an argument parallel 
to Fisher’s, but based more firmly on genetic concepts, in Chapter 8. 



1.6 Modelling 

Much of the discussion in previous sections concerns the analysis of some 
model. That is, some set of assumptions, usually incorporating mathemat- 
ical formulas, is constructed attempting to describe the real-world process 
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or phenomenon being considered. The model is then analyzed by math- 
ematical or other methods to find its properties, and the implications of 
these in the real world are then discussed. It is thus worthwhile to discuss, 
albeit briefly, the modeling process in mathematical evolutionary genetics. 

The concept of mathematical modeling in biology was inherited from the 
very successful modeling process in physics. But the two respective natures 
of the modeling process in the two areas are quite different. In physics one 
aims at, and largely achieves, mathematical models that describe the real 
world very precisely, based for example on Newton’s laws of dynamics. This 
allows, for example, the calculation of trajectories of space vehicles so that 
they arrive precisely at some desired location. No such precision is possible 
in evolutionary genetics. The biological world is too complex, and unpre- 
dictable phenomena ranging from mutations to large-scale ecological events 
are so prevalent, that no precise prediction of the course of evolution is pos- 
sible. Nevertheless it is possible by using mathematical models to arrive at 
general principles that do lead to important evolutionary conclusions. The 
discussion above following the Hardy- Weinberg law is an example of this, 
and other examples will be given later in this book. 

Even though mathematical models in evolutionary population genetics 
cannot hope to describe the real world with the precision that is often pos- 
sible in physics, it is nevertheless important that any mathematical model 
used be well-defined and consistent, containing no internal contradictions. 
Further, no ad hoc assumptions, which can possibly contradict the implicit 
properties of the model, should be made during the course of the analysis 
of the implications of the model, since doing so can in principle lead to 
reaching any conclusion whatsoever. 

More important, any mathematical model should aim at capturing the 
essential features of reality so that the conclusions drawn from it are useful. 
This was well known to the pioneers, who showed great skill in devising 
models that do this. Unfortunately, one aspect of the evolutionary process 
with which they were quite familiar was not sufficiently emphasized by 
them, and this has lead to a recurring error by a succession of analysts, not 
usually geneticists, concerning the possibility of the evolution of the com- 
plex life forms that we see today by the Darwinian-Mendelian process. This 
error follows from an inappropriate model of the evolutionary process. The 
essence of the error can be seen from the following oversimplified example. 

Suppose that we wish to attain some desired sequence of 19 letters, for 
example THEGREATWALLOFCHINA. Here we might think of the first 
letter, “T”, as the first desired gene in a sequence, the second letter “H” 
as the second gene in the desired sequence, and so on. The incorrect model 
for an evolutionary process arriving at this sequence is as follows. Suppose 
that we randomly choose 19 letters. If they happen to form the desired 
sequence, we have evolved in one step to the desired sequence. However 
the probability of doing this is minute, being (26) -19 . In the much more 
likely event that we did not form the desired sequence, the first sequence is 
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entirely discarded and a new sequence of 19 letters is formed. We continue 
in this way until the desired sequence of genes happens to be reached, a 
procedure taking a mean of (26) 19 steps. Even at one step per second, this 
mean time is far longer than the time since the Big Bang. 

But this is an incorrect model of evolution. Assuming that each of the 
letters, or genes, in the desired sequence is itself desirable, a more plausible 
model is that after the first random sequence has been chosen, any letter 
in this sequence that happened to match the corresponding letter in the 
desired sequence is retained. At the second step, a random choice is then 
made for those letters that did not match the desired sequence. Any letters 
obtained at this second step that match the corresponding letter in the 
desired sequence are retained, along with any that were retained at the 
first step. This process continues until the correct letter is obtained at all 
locations, a process taking on average only a few hundred steps. 

While this second process is still a very crude representation of reality, it 
does model the genetic evolutionary process more appropriately than does 
the first process. A gene that is good for vision is not thrown out, but is 
retained, while a gene that is good for some other function function evolves. 

Clearly evolution is not aiming at some a priori target, but might arrive 
at the equally effective ATITANICCHINESEWALL instead of the sequence 
above. This does not affect the broad conclusion of the above argument. 

It is a pity that a small proportion of scientists, often outside the field 
of genetics, regularly re-invent the incorrect modeling paradigm, since the 
negative views of the possibility of evolution that they form are then seized 
upon by creationists as support for their arguments. Fisher, Haldane and 
Wright all described the correct paradigm, or model, quite clearly, but 
unfortunately their message was not sufficiently absorbed into the theory, 
nor into scientific circles generally. 

At a more minor level, there are other aspects of modeling theory that 
are often overlooked within the population genetics literature. Perhaps un- 
fortunately, the simple Wright -Fisher model discussed at length in Section 
1.4.3, has assumed a “gold standard” status, and serves as a reference 
distribution for several calculations in population genetics theory. This has 
arisen largely for historical reasons, and the fact that this is only one model 
among many, and is far less general and plausible than the Cannings model 
discussed in much detail later, is generally overlooked. We mention two ex- 
amples where the fact that the Wright-Fisher model is no more than a 
reference model has been often overlooked, with unfortunate consequences. 

First, the concept of the “effective population size”, discussed in more 
detail in Section 3.7, is defined with reference to the simple Wright-Fisher 
model (1.48). A certain model has effective population size N e if some 
characteristic of the model has the same value as the corresponding char- 
acteristic for the simple Wright-Fisher model (1.48) whose actual size is 
N e . Further, the comparison of several characteristics are possible, and this 
leads to different varieties of effective population size. Except in simple 
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cases, the concept is not directly related to the actual size of a population. 
For example, a population might have an actual size of 200 but, because 
of a distorted sex ratio, have an effective population size of only 25. This 
implies that some characteristic of the model describing this population, 
for example a leading eigenvalue, has the same numerical value as that 
of a Wright-Fisher model with a population size of 25. It would be more 
indicative of the meaning of the concept if the adjective “effective” were 
replaced by “in some given respect Wright-Fisher model equivalent”. Mis- 
interpretations of effective population size calculations frequently follow 
from a misunderstanding of this fact. The concluding comments of Section 
3.7 discuss this point at length. 

Second, the fundamental genetic parameter 9 will be introduced in Sec- 
tion 3.6 in the discussion of the Wright-Fisher model (3.72). For that model 
9 assumes the value 4iVu, and the identification of 8 and 4 Nu is very 
common in the literature. However, for models other than Wright-Fisher 
models a different definition of 9 is needed. This is particularly true of the 
exchangeable model of Cannings (1974) introduced in Section 3.3, which 
provides a most important generalization of Wright-Fisher models, and is 
also true of the Moran model introduced in Section 3.4. Much of the dis- 
cussion in Chapter 9 refers to this point. The identification of 9 with 4 Nu 
arises in effect from an inappropriate assumption that the simple Wright- 
Fisher model (1.48) is the stochastic evolutionary model relevant to the 
situation at hand. The rather more general definition of 9 as 4 N e u partly 
overcomes this problem, but does not do so entirely, since (as mentioned 
above) there are several distinct concepts of the effective population size 
N e . 



1.7 Overall Evolutionary Theories 

We now outline the two contrasting views of evolution arrived at by Fisher 
and Wright. 

Fisher’s view was focused on the long term. With this perspective, his 
evolutionary view in a way a simple one. He considered populations to be 
very large: The numerical values used in (Fisher (1930a)) for population 
size are often of order 10 9 or larger. Thus apart from the particular case 
of the probability of survival of an individual new mutant, stochastic ef- 
fects are not regarded as being of central importance, and deterministic 
analyses are seen as being sufficient to describe the essence of evolutionary 
behavior. Even in the case of new mutants, where a stochastic analysis is 
unavoidable, we have seen that essentially deterministic behavior arises for 
recurrent mutations in large populations. Thus, once the genetic raw ma- 
terial has been furnished by mutation, natural selection is regarded as the 
sole important agency in shaping genetic evolution. 
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The nature of this selection is also seen as being rather straightforward. In 
the first place, since complexes of genes at various loci, even if harmonious, 
tend to be broken up ultimately by recombination, a stronger emphasis is 
placed on genes at single loci than that placed on gene complexes. This 
is not to deny the fact that as we have just seen, Fisher viewed interac- 
tive systems as being important. But, for example, so far as evolutionary 
processes are concerned, the effect of an interactive system such as (1.92) 
simply has the effect of yielding a selective advantage to Mi over M 2 , and 
the primary emphasis is placed on this fact. This leads to the point of view 
(Fisher (1953)) that “it is often convenient to consider a natural population 
not so much as an aggregate of living individuals but as an aggregate of 
gene ratios”. Fisher would have regarded this view as an approximation, 
but one which is nevertheless sufficient to describe the main characteristics 
of evolution. This view pervades, directly or indirectly, his work not only in 
population genetics but also, interestingly enough, in the statistical theory 
of experimental design (see, for example, Fisher ( 1926, p. 511)), which was 
strongly influenced by, if indeed not suggested by, his research in genetics. 

In population genetics a corollary of this view is that frequencies of ga- 
metes can be found, at least to a sufficient approximation, as the product 
of the frequencies of the constituent alleles. This approximation is implicit 
in his pioneer work in both quantitative and evolutionary genetics, except 
in special cases involving, for example, assort at ive mating. Thus, for exam- 
ple, in both fields the total additive genetic variance, a quantity of central 
importance, appears to be defined by him as the sum of the constituent 
one-locus marginal values (Fisher (1918, p. 405; 1958, p. 37)). We shall 
see later that while this is correct if indeed gamete frequencies can be so 
calculated, it is not generally so. 

A further characteristic of Fisher’s evolutionary view, arising from the 
above considerations and the assumed very large sizes of populations, is 
that an allele having a net selective advantage, no matter how small, is 
destined for fixation, at least while the selective advantage persists. Thus, 
for example, one of his main objectives in putting forward his theory of 
the evolution of dominance through the natural selection of modifiers was 
to show that even a minute selective force would have evolutionary con- 
sequences. This was seen as being so even if the modifiers are subject to 
selective forces other than through dominance modification. Fisher’s rea- 
soning on this point, (in particular Fisher (1934, pp. 372-373)) is not clear 
to this writer, who shares Wright’s (1934) doubts on its acceptability. 

Against these views should be set the fact that Fisher’s Fundamental 
Theorem of Natural Selection, which we examine in detail in Sections 2.9 
and 7.4.5, is a fully multilocus, indeed entire genome, result. In contradic- 
tion to the conventional wisdom view of it, the theorem does not assume 
random mating, whereas a high proportion of Wright’s mathematical work, 
discussed in more detail below, does make this assumption. 
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To summarize, Fisher’s view on the nature of evolution involves large 
population sizes, an emphasis on the long term and on the main effects of 
single loci as contrasted with complexes of loci, and a steady and essentially 
deterministic increase in the frequency of each allele having a selective 
advantage, no matter how small, with regard to the various alternative 
alleles at its locus. Evolution can be viewed to a large extent on a locus- 
by-locus basis, and the net evolutionary pattern can be found by “adding” 
together such single-locus events. 

Fisher’s view has a grand simplicity to it. Is it, however, simplistic? The 
evolutionary theory reached by Wright (1931, 1956, 1960, 1965b, 1969b) 
appears, at least at first sight, to be more subtle. Wright arrived at his 
view of evolution by discussing in turn several modes of the way in which 
gene substitution by selection can occur. He first considered selection in a 
very large random-mating population in a stable environment. The rates of 
change of gene frequency can be assumed to follow, at least to a reasonable 
approximation, differential equations of the form (1.27). Successive substi- 
tutional processes depend on the occurrence of favorable new mutations, 
and these are seen as arising sufficiently rarely so that evolution in this 
manner takes place too slowly to be effective. 

This led to a view of the circumstances most favorable to evolution that 
is more complex than Fisher’s, and of a different nature. Wright proposed 
a three-phase process under which evolution could most easily occur. This 
view assumes that large populations are normally split up into semi-isolated 
subpopulations, or demes, each of which is comparatively small in size. 
Within each deme there exists a genotypic fitness surface, depending on 
the genetic constitution at many loci, and in conformity with the “increase 
in mean fitness” concept, gene frequencies tend to move so that local peaks 
in this surface are approached. The surface of mean fitness is assumed to 
be very complex with a multiplicity of local maxima, some higher than oth- 
ers. If a fully deterministic behavior obtains the system simply moves to 
the nearest selective peak and remains there. The importance of the com- 
paratively small deme size is that such strict deterministic behavior does 
not occur: Random drift can move gene frequencies across a saddle and 
possibly under the control of a higher selective peak. Random changes in 
selective values can also perform the same function. In this way a succession 
of peaks can be reached, each one higher than the previous one. Interpop- 
ulational selection, arising from migration of individuals from demes which 
have higher selective peaks than have other demes, allows the favorable 
gene complex to spread ultimately throughout the entire population. The 
unit of selection here is the entire gene complex and not individual alleles. 
Indeed the latter are viewed as often having no absolute selective advan- 
tage, being perhaps favorable in some gene combinations but unfavorable 
in others. 

A case where evolution can more easily take place under this mode com- 
pared to that of Fisher is that of two alleles, one at each of two loci, which 
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individually are deleterious but together are favorable. Calling the alleles 
in question Ai and B\, one selective scheme where this might occur is the 
following: 

B\B\ B 1 B 2 

A 1 A 1 1+ r 1 + s 

A 1 A 2 1 T- s 1 

A 2 A 2 1 — t 1 — u 

Here r > s > 0 and t > u > 0. Under a deterministic scheme the frequencies 
of Ai and if initially small, will be kept small (because of the selective 
disadvantage of A 1 A 2 B 2 B 2 and A 2 A 2 B 1 B 2 to A 2 A 2 B 2 B 2 ) . If however in 
one deme the frequencies of A\ and B\ can reach a sufficiently high value, 
the selective advantage of A\A\B\B\, and to a lesser extent of A 1 A 1 B 1 B 2 
and AiA 2 B\Bi, will lead to fixation of A\ and B\. In terms of the previous 
discussion, this implies passing across a saddle from a selective peak at 
frequency (Ai) = frequency (Bi) = 0 to a higher selective peak at frequency 
{Ai) = frequency (B\) = 1. By migration the favored complex, involving 
the gamete A\B\ in high frequency, is now assumed to spread to all demes. 

It will be clear that Wright’s emphasis, at least compared to Fisher’s, was 
on interactive genetic systems in which most characters are affected by the 
genes at many loci and most genes have pleiotropic effects, that is influence 
several characters. Fisher was of course fully aware of the importance of 
the interactive nature of genetic systems, as his work on the evolution of 
dominance shows. However, his view tended to the claim that in the very 
long term, the effects of single genes would be important. Wright’s view was 
no doubt strongly influenced by his early experimental work on the coat 
color of guinea pigs, which revealed the importance of these interactive ef- 
fects. From the very first (see, in particular, Wright (1935, 1952, 1969b)) 
his conceptual framework involved multilocus analysis and in particular an 
examination of the “optimum” model (1.91) and its various generalizations 
for more than two loci. We examine the model (1.91) in more detail in 
Chapter 6, and will find that Wright’s analysis of this model is flawed. His 
analysis uses gene frequencies rather than the correct gametic frequencies, 
and a correct analysis using gametic frequencies shows that the equilib- 
rium point of this model, which he investigated, is unstable and thus of no 
interest. 

This leads to a further criticism of his work from a mathematical point 
of view. The only multilocus model that he analyzed mathematically is 
the model (1.91) discussed above. Thus despite his emphasis on multilocus 
fitness systems, he never analyzed one in an appropriate mathematical way. 
Further, the mean fitness increase theorem, a central feature of his fitness 
surface analysis, will be shown in Chapters 6 and 7 not to be correct as a 
mathematical theorem in the multilocus case. 



B2B2 
1 -t 
1 — u 
1 



( 1 . 96 ) 
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Finally, just as we asked whether Fisher’s view of evolution in Mendelian 
populations is too simplistic, it is equally reasonable to ask whether 
Wright’s overall views, particularly those involving population subdivision 
with migration between partially isolated demes, are not too complex. His 
picture of evolution may well rely on an equipoise of migration rates, fit- 
ness differentials and deme sizes of an unrealistically finely-tuned nature. 
We shall examine this point later when assessing the role of these various 
factors, and of linkage, in evolution. It should however be mentioned that 
the facile criticism of Wright’s evolutionary theory, that random drift is 
seen as an alternative to selection, has no basis in reality. Random drift is 
conceived of as acting merely as a trigger mechanism in the first phase of 
the process, changing gene frequencies within each deme before the more 
permanent and important changes brought about by selection. 

The debate about whether Fisher’s broad view or Wright’s broad view 
is the more appropriate continues, point lessly, to this day. Whatever differ- 
ences Fisher and Wright may have had, they are dwarfed by their agreement 
on the need to formulate a new evolutionary theory based on Mendelian 
genetics and the essential identity of much of their (separate) calculations 
concerning this new evolutionary process. This implies that it is necessary 
to be familiar with their work, but also necessary to move forward from the 
paradigms established by these two giants. This volume is intended, on the 
one hand, to summarize mathematical aspects of this evolutionary theory 
as it was developed by Fisher, Haldane and Wright and their immediate 
successors, and on the other hand to introduce the molecular genetics-based 
contemporary theory. In the latter aim it is intended to form the basis of 
some of the material to be discussed in Volume II. 




2 

Technicalities and Generalizations 



2.1 Introduction 

This chapter is largely technical in nature. Its aim in part is to consider 
in more detail some of the theoretical points raised in Chapter 1, and in 
part to put these in a setting that allows a more detailed and up-to-date 
discussion of them in later chapters. A second aim is to introduce some 
further techniques not discussed in Chapter 1. Some rather straightforward 
generalizations of the theory are also made. Finally, the statement of the 
Fundamental Theorem of Natural Selection for one gene locus will be given 
and proved. 

Population genetics models often make a number of simplifying assump- 
tions, for example that random mating obtains, that fitnesses are fixed 
constants, that the population size is effectively infinite, and so on. In this 
chapter we consider what happens when some of these assumptions are 
relaxed or even dropped altogether. It is difficult enough to consider the 
effect of relaxing two or three of these assumptions simultaneously and 
quite impossible to consider the effect of relaxing them all. In the various 
sections of this chapter we therefore consider one or other generalization 
of the theory brought about by relaxing one or other of these assumptions, 
without attempting to assess the effect of simultaneous relaxation of two 
or more assumptions. Such an assessment must, at the moment, be largely 
nonquant it at ive. 
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2.2 Random Union of Gametes 

In elementary textbooks the way in which the frequencies of the various 
genotypes in a daughter generation are derived from those in the parent 
generation is by means of a two-way table. All the various possible matings 
are listed, their frequencies and the relative frequencies with which they 
produce various offspring genotypes are noted, and thus the frequencies 
of the daughter generation genotypes are calculated. This procedure was 
outlined in Chapter 1 for the case of non-random-mating populations. It 
is far more efficient, however, for random-mating populations, to proceed 
in a different way. Restricting attention to autosomal loci, we observe that 
each individual transmits, for each locus, one gene to each of his/her off- 
spring: The union of two such genes, one from each parent, defines at that 
locus the genotype of the offspring individual. Random mating of parents is 
equivalent to random union of genes. Thus, for example, using the notation 
of Section 1.2, since the frequency of A\ in the parent generation is X + T, 
the frequency of A\A\ in the daughter generation, being the probability 
that two genes drawn at random from the parent generation are both Ai, is 
( X + Y) 2 . This argument, and parallel arguments for the other genotypes, 
together give equations (1.1)— (1.3) immediately. Only minor extensions of 
the argument are needed for more complex cases such as sex-linked loci, 
multiple alleles, dioecious populations, and so on, and we use this form of 
argument below in developing the properties of these more complex models. 

It was stated in Section 1.6 that explicit models should be set up be- 
fore any mathematical analysis is attempted, so it is necessary to state 
more explicitly the model assumed in the above argument. It has been as- 
sumed that the population is monoecious, of effectively infinite size and 
that any daughter-generation individual is formed by the mating of two 
randomly chosen individuals of the parent generation. It is also assumed 
that there are no geographical effects, no mating success differentials, and 
so on. Perhaps most important, it is also assumed that distinct generations 
can be recognized, so that matings occur only between individuals of the 
same generation, and that these individuals do not participate in further 
mating once the daughter generation is formed. These assumptions imply 
that there is no population age structure. Later, models with assumptions 
that are more general than, and also rather different from, these will be 
introduced. 



2.3 Dioecious Populations 



In this section we drop the assumption that the population is monoecious 
and suppose instead that it is dioecious, that is admits two sexes. The other 
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assumptions of the previous section are maintained. We focus initially on 
the autosomal case, deferring the analysis of the sex-linked case to later. 

Suppose first there is no selection, and that in a given generation the 
genotypic frequencies are as given in (2.1) below: 





MA X 


A1A2 


^42-^2 




males: 


X M 


2 Y m 


Zm 


(2.1) 


females: 


x F 


to 


Z F 





The argument of the random union of gametes, suitably modified to the 
dioecious case, shows that the frequency of A\A\ individuals among both 
males and females of the daughter generation is (Xm + Ym)(Xf + If), 
with parallel formulas for A\A 2 and A 2 A 2 . This implies that after one 
further generation of random mating the frequencies in both sexes are in 
the Hardy- Weinberg form 

A\A\ A\A 2 A 2 A 2 

x 2 2x(l — x) (1 — x) 2 

where 

x = - (Xm + Xp + Ym + Yf)' (2-3) 

The frequencies of the three genotypes among males and among females 
now remain equal in all further generations. For this reason we often make 
the modeling simplification of ignoring the existence of two sexes, except 
of course in special cases, for example in discussing the sex ratio. 

One case where the existence of two sexes has to be taken into account 
is that where genotype fitness values are different in males and females. 
Suppose then that viability selection exists, so that the relative fitnesses of 
the genotypes AiAi, AiA 2 and A 2 A 2 in males are wu, w\ 2 , and w 22 , with 
corresponding values Vn, v\ 2 and v 22 in females. We consider genotypic fre- 
quencies immediately after the formation of the zygotes of any generation, 
and suppose that in a given generation the males produce A\ gametes with 
frequency x and A 2 gametes with frequency 1 — x. Let the corresponding 
frequencies for females be y and 1 — y. Then at the time of conception of 
the zygotes in the daughter generation the genotypic frequencies are, in 
both sexes, 

AlAi A\A 2 ^2^2 

xy x(l-y) + y(l-x) (l-x)(l - y) 

By the age of maturity these frequencies will have been altered by 
differential viability to the relative values 

A\A\ A\A 2 ^4.2^2 

males: w u xy w 12 {x(l - y) + y(l - x)} w 22 (l - x)(l - y) 

females: v n xy v 12 {x(l - y) + y( 1 - x)} v 22 (l - x)(l - y) 
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The frequencies x f and y / of A\ gametes produced by males and females of 
the daughter generation are thus 



wuxy + |wi 2 {x(l - y) + y{l - z)} 

wi±xy + Wi 2 {x(l -y) + y{ 1 - ar)} + w 22 (l - x)(l - y) ’ 



(2.4a) 



, = vnxy + \v\ 2 {x{l -y) + y{ 1 - x)} 4b 

V v n xy + v 12 {x(l - y) + y(l - x)} + v 22 (l - x){l - y) ‘ 

These recurrence relations cannot in general be solved explicitly. It is nev- 
ertheless possible to arrive at certain important properties concerning their 
equilibrium points. It is clear that if selection favors the same allele in both 
males and females there will be no internal equilibrium, so the two cases 
of real interest are, first, that where different genes are favored in the two 
sexes, and second, that where overdominance is involved. Our analysis of 
these two cases follows that of Kidwell et al. (1977). 

Suppose first there is no dominance in fitness for each sex and that 
selection acts in opposite directions in the two sexes. We thus write the 
fitnesses in the form 

A\A\ AiA 2 A 2 A 2 

males 1 1 — ^s m 1 — s m 

females 1 — 5/ 1 — ^5/ 1 



where s m , 5/ > 0. Solution of the equilibrium equations x = x f , 
gives, as the only possible equilibrium, 

X — 1 S rn + {(5 m 5/ Sm Sf + 2)(2s m S/) } ^ , 

JJ = 1 5/ T {{SmSf S m Sf T 2)(2s m 5/) } / . 

This equilibrium will be admissible (0<x<l,0<y<l) only if 

Sm ^ „ S m 

< Sf < 

1 T s m 1 5 m 



y = y' 



(2.5a), 



or, equivalently, if 



s f 

1 + 5 / 



+ Sm + 



S f 

l ~ S f 



(2.5b) 



When these conditions apply the equilibrium can be shown to be stable. 
We conclude that especially if s m and s / are small, additive selection acting 
in opposite directions in the two sexes will maintain a stable equilibrium 
only if the selective differences in the two sexes are fairly close. 

Suppose now that dominance is introduced, so that the fitness scheme 
becomes 



A\A\ 

males 1 
females 1 — 5/ 



A\A 2 
I h m 5 m 

1 — hfSf 



A 2 A 2 

1 5 m 
1 
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An interesting special case occurs when hf + h m = 1. Here the conditions 

(2.5) that there exist a single stable internal equilibrium point continue to 
apply. When hf + h m < 1 there will be at most one equilibrium point, and 
the conditions on s m and s / for this to occur are rather less stringent than 

(2.5) . Thus, speaking roughly, for smaller hf and h m values, a larger range 
of s m and Sf values will lead to an equilibrium point. When hf + h m > 1 
it is possible that more than one internal equilibrium point can arise, but 
the conditions for this are not given here. 

When directional selection obtains for one sex and overdominance in 
the other, one suspects that a stable polymorphic equilibrium is possible 
provided the directional selection is not too strong. We quantify this state- 
ment in a moment when considering conditions for a stable polymorphic 
equilibrium to exist. 

It is of considerable interest to ask how effective the existence of differ- 
ent selective schemes in the two sexes is in maintaining genetic variation 
compared to the corresponding effect when identical selective schemes ob- 
tain in the two sexes. We attack this question quantitatively by considering 
the conditions for the existence of an internal polymorphism. For practical 
purposes we may suppose that such a polymorphism exists when the two 
equilibria freq(Ai) = 0 in males and females, freq(A 2 ) = 0 in males and 
females, are both unstable. If we linearize the recurrence relations (2.4) 
around x = y = 0 and around x — y — 1 , we find that the condition for an 
internal polymorphism is that both the inequalities 

(^ 12 /^ 22 ) + (^ 12 /^ 22 ) > 2, (2.6a) 

(W 12 /W 11 ) + (vw/v n) > 2 (2.6b) 

should hold. These requirements are the natural extensions to the cor- 
responding monoecious population requirement that the heterozygote be 
more fit than both homozygotes. 

When Ai is at a selective advantage in males (so that w\\ > w \2 > W 22 ) 
but overdominance applies in females (so that v \2 > t>n, ^ 22 ), condition 
(2.6a) holds automatically. However, condition (2.6b) will hold only if the 
overdominance in females is sufficiently strong compared to the directional 
selection in males. Thus (2.6b) quantifies our earlier discussion of this point. 

How stringent are the conditions given in (2.6)? Suppose we normalize so 
that W \2 = V 12 = 1. The conditions (2.6) then reduce to the requirements 
that the harmonic means of Vn and wn, and that of ^22 and W 22 , should 
both be less than unity. Since harmonic means are less than arithmetic 
means, this is a less stringent requirement than that the arithmetic means 
both be less than unity. In other words, the existence of different selective 
parameters in the two sexes provides a stronger mechanism for maintaining 
genetic polymorphism than taking average selective values over the two 
sexes would suggest. 
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The above analysis concerns autosomal loci, and clearly a special analysis 
is needed in the sex-linked case. Taking the males as the heterogametic 
sex, the frequencies of the various genotypes in the sex-linked case can be 
written 

male female 

A\ A 2 A 1 A 1 A 1 A 2 A 2 A 2 

x 1-x Y n 2Yi 2 Y 22 

If there is no selection, the discussion outlined in the previous section shows 
that the frequencies in the following generation are 

x' = Y n +Y 12 , 

Y{, = x(Y n + Y 12 ), 

2 Y; 2 = x(Y 12 + Y 22 ) + (1 - x)(yn + Y 12 ), 

Y 22 = (1 - x)(Y 12 + Y 22 ). 

In contrast to the autosomal case, one generation of random mating is not 
sufficient to yield equal frequencies of A\ in the two sexes. Nor does one 
further generation of random mating produce female genotypic frequencies 
in Hardy- Weinberg form. On the other hand, since 

a/ - (y/i + y( 2 ) = -i{x - (y n + y 12 )}, 

the absolute value of the difference between male and female frequencies 
of A\ is halved between successive generations. For practical purposes we 
may thus assume that after a short time, these frequencies are equal: If 
this is so, one further generation of random mating yields frequencies in 
the form 

males females 

A\ A 2 A\A\ A\A2 A 2 A 2 

z (1 — z) z 2 2z{l — z) (1 — z) 2 

where 

z — \ x Y §(Fii T- F 12 ). 

When selection operates the behavior is clearly more complex, as is shown 
by Sprott (1957), Bennett (1957) and Cannings (1967, 1968). We do not 
go into details here, and in this book we give little attention, perhaps less 
than is deserved, to sex-linked genes, under the assumption that properties 
of autosomal loci are normally mirrored, perhaps with minor alterations, 
in the sex- linked case. 

While, in both autosomal and the sex-linked cases, the evolutionary be- 
havior of two-sex systems is slightly more complex than in the monoecious 
case, the important Mendelian properties of conservation of genetic varia- 
tion and the suitability of the Mendelian system for evolutionary processes 
continue to apply. 
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2.4 Multiple Alleles 

We turn now to the case of multiple alleles, considering only random-mating 
populations. Suppose that at an autosomal locus A, alleles Ai, A2 , . . . , A& 
can occur. We consider a model identical to that of Section 2.2 and assume 
there is no selection. If the frequency of A* in any generation is the con- 
cept of the random union of gametes shows that in the next generation the 
frequency of A*A^ will be xf and that of A\Aj (i 7^ j) will be 2 XiXj. These 
frequencies are in generalized Hardy- Weinberg form and are maintained 
through future generations. 

Suppose now that viability differentials exist and that the fitness of A* Aj 
is Wij. It is clear that if we continue to count individuals at the moment 
of conception of each generation, the genotypic frequencies are in Hardy- 
Weinberg form at that time. The gene frequencies will normally change 
from one generation to another, and the appropriate recurrence relations 
are 



x i = Xi^WijXj/w, (2.7) 

3 

— XiWi/w , ( 2 . 8 ) 

the sum (as with all sums in this section) being over 1, 2 , . . . , k. In this 
equation Wi, the “marginal fitness of the allele A”, is defined as 

w i = (2.9) 

3 

In equations (2.7) and (2.8) the quantity w, the mean fitness of the 
population, is defined by 

w = ^^XiWi = ^^2 WijXiXj . ( 2 . 10 ) 

i i j 

In view of the statement of the mean fitness increase theorem in Section 
1.4, and the condition given there for the existence of a stable internal 
equilibrium point under the action of selection only, it is natural to ask 
whether the mean fitness increases from one generation to another in the 
multiple allele case, and to seek the conditions on the that ensure a 
stable internal equilibrium point (that is each Xi > 0) of gene frequencies. 

The most efficient proof that mean fitness increases in the multiple allele 
case was given by Kingman (1961a) and is reproduced in detail here. The 
daughter generation mean fitness w f is defined by w f = ^^WijX^Xp and 
we are required to prove that with this definition, w f — w > 0. Using (2.7), 
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we obtain 

w' = W~ 2 (yty ^Wij{XjWi){XjWj)) 

i 3 

= W 2 (XI X X w ij w imXiXjXmWj ) • 

i j m 

By interchanging the roles of j and m we also have 

W = W ~ 2 ^ ^ w ij w imXiXjX m W m ) . 

i j m 

Thus by averaging, we find 

w' — lw~ 2 (^^^WijWirnXiXjXmiWj + W m )) 

i j m 

> w~ 2 (LEE ll : / j tl^i ln ( ll?j I l' !n ) ^ XiXjXm) (2.11) 

i j m 

= W~ 2 Y^ X i(^l l X 3 W iA W j) 112 ) 2 

* i 

> -- 2 (X^X^^K) 1/2 ) 2 ( 2 - 12 ) 

* 3 

= *- 2 (E*m) 1/2 £W 

j * 

= *- 2 (£*>,) 3 ' 2 ) 2 

3 

- ^({X^^')} 7 ) 2 ( 2 - 13 ) 

= ™- 2 (X^i) s 

= w. 

In this sequence of steps the inequality (2.11) is justified by the inequality 
| (a +6) > (ab) 1 / 2 for positive quantities a and 6, and the inequalities (2.12) 
and (2.13) are justified by the convexity property X\a f > (E Xidi) for 
nonnegative ai and n > 1. If we assume each Xi > 0, this proof also shows 
that w f = w if and only if w\ = W 2 = * * • = te*,, and when this is so, 

Wi = w , i = 1, 2, . . . , A;. (2.14) 

This equation and (2.8) together imply that x[ = x^ so that the system is 
at an equilibrium point. We thus conclude that in the evolutionary system 
(2.7), the mean fitness always increases except when the system has reached 
an equilibrium point, where of course it remains unchanged. This conclusion 
also applies when some of the Xi are zero, although here of course (2.14) 
is true only for those values of i for which Xi is positive at the equilibrium 
point. 
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In view of the discussion in Chapter 1, it is natural to ask whether the 
change in mean fitness can be approximated by , the additive genetic 
variance in fitness. The natural generalization of the procedure that led to 
(1.16) is to define u\ as the maximum sum of squares removed by aq , . . . , 
in the expression S, defined by 

S = XiXj(wij — w — oti — a j) 2 . (2.15) 

It is found that the values of the aq that lead to the minimizing of S are 
ai—Wi — w , i = l,2, . (2.16) 

From this it follows, after some algebra, that 

o\ = 2 Xi(wi - w) 2 . (2.17) 



When k = 2 this reduces to the value given by (1.42). 

We now wish to compare the expression in (2.17) with the mean fitness 
change w f — w, which we write as 

W — W = W~ 2 pT ^ WijXiXjWiWj — w 3 . 

If Wij = w + dij, Wi = w + 5i, where the 5ij are assumed small, this becomes, 
on ignoring terms of order 5f -, 




This is identical to (2.17), and we conclude that for small fitness differentials 
the increase in mean fitness is very closely approximated by the additive 
genetic variance in fitness. Thus, under the assumptions made, in particular 
that of small fitness differentials, the MFIT holds for an arbitrary number 
of alleles at the locus. When fitness differentials are not small a rather 
different conclusion is found (Seneta (1973)). 

Suppose that each Xi is positive. Then (2.17) shows that g\ is zero if and 
only if W\ = W 2 = . . . = Wk = w. If some of the X{ are zero, the additive 
genetic variance o\ is zero if (2.14) applies for those values of i for which 
Xi is positive. In both cases the discussion above shows that a\ is zero 
if and only if the system is at an equilibrium point. We see later that in 
multilocus systems the identification just reached for one locus, namely 

a A = 0 population in equilibrium (2.19) 

no longer holds, although a restricted version of this conclusion can be 
found. 
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We consider now the evolution of a metrical character, not necessarily 
fitness, under the evolutionary system (2.7). Consider some character which 
for A{Aj individuals takes the measurement ra^-. The mean value m of 
this character is given by m = Y Y x i x j m iji and we wish to compute the 
change in this mean after one generation. To a first order of approximation, 

Am — 2 y^(A Xj)xjmjj 

= 2 'y^(Ax l )m l 

= 2^2(Axi)(rrii - fh) 

tt2^Xi(wi-w)(rrii-m), ( 2 . 20 ) 

where we have defined ra*, the marginal measurement for the allele A*, by 

rrii = ( 2 . 21 ) 

A verbal description of this conclusion is that the change in the character 
is twice the covariance between marginal allelic values of the character 
itself and fitness. For further details, see Robertson (1966, 1968). When 
the character is fitness itself this conclusion reduces to that obtained in 

( 2 . 18 ). 

We turn now to the condition under which a stable equilibrium of 
gene frequencies exists. We first assume that each Xi is positive at the 
equilibrium. The equilibrium conditions (2.14) can be written 

Wi — w i=0, i — 2, 3, . . . , 

X\ + X 2 H~ * * * Xk = 1, (2.22) 

and this is just a system of k linear equations in k unknowns. It thus 
possesses no solution, one solution or an infinity of solutions. The first and 
third cases arise only for special values of the Wij, such as, for example, 
when all fitnesses are equal. In practice it is most interest to ignore these 
cases and suppose there is a unique solution of (2.22). Unfortunately this 
solution might be inadmissible, that is the condition 0 < X{ < 1, i = 
1 might not be met, and even if the equilibrium is admissible it 

need not be stable. Fortunately the stability criteria have been obtained 
(Kingman, (1961b)). A unique admissible solution to (2.22) will be stable if 
and only if the matrix W = {wij} has exactly one positive eigenvalue and at 
least one negative eigenvalue. In this case the system moves, for any initial 
frequency point for which each Xi is positive, to this equilibrium. If the 
equilibrium (2.22) is not admissible or is unstable, the system (2.7) evolves 
in such a way that one or more alleles become eliminated. The behavior then 
becomes considerably more complicated, and in practice perhaps the best 
procedure is to note that the system always moves so that w is maximized, 
so that finding the maximum value of w subject to the constraints 0 < Xi < 
1, Y x i = 1, V ^ a Kuhn-Tucker theory for quadratic programming, will 
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provide the stable equilibrium point. A result of Kingman (1961b) relevant 
to this is that if W has j positive eigenvalues, then at most k — j + 1 alleles 
will exist with positive frequencies at this equilibrium. 

As the simplest possible example of this theory we consider the case 
where all homozygotes have fitness 1 — s (0 < s < 1), and all heterozygotes 
have fitness 1. Clearly there is an admissible equilibrium point at Xi = fc -1 . 
This will be stable if the matrix 



(l-s 



W = 



1 

1 



1 

1 - s 

1 



1 

1 

1 — s 



1 \ 
1 
1 



\ 1 1 1 •■•l-s/ 



has exactly one positive eigenvalue and at least one negative eigenvalue. 
But standard theory shows that the eigenvalues of this matrix are fc — s, 
— s, . . . s, and thus the stability conditions are indeed met. 

We turn finally to the correlation between relatives in the fc-allele system, 
and take as an example the correlation between father and son. Suppose 
the father has genotype AiAi (and thus measurement m^). The son will be 
AiAj (and have measurement m^) with probability Xj, and since the fre- 
quency of AiAi fathers is x 2 this will make a contribution to the covariance 
of 



x 2 ^ XjUiurriij = x 2 mima. (2.23) 

If the father is A^Aj (frequency 2 XiXj) the son will be AiAi (probability 
\xi) or AjAj (probability ^Xj), A { Aj (probability ^(a^ + Xj)), AiAi (prob- 
ability \x$) or AjAg (probability \xi). The contribution to the covariance 
corresponding to this case is 

2x i Xjmi j [l(xim il H \~ x k m ik ) + \ {x\mji H \~ x k m jk )] 

— XiXjrriij(mi + rrij). (2.24) 

Adding (2.23) over all i and (2.24) over all i,j ( i < j) we arrive at the 
covariance 

^2 x i m imii + XiXjVriijimi + rrij) — fh 2 = ^ Xi(rrii — fh) 2 . 

i i<j i 

This is just half the expression (2.17) (if we replace Wij by the more general 
rriij), and in this way we recover expression (1.10) for the correlation in 
the measurement between father and son, where now both variance terms 
have the more general fc-allele interpretation. Identical conclusions apply for 
other relationships, and we conclude that the correlation formulas found in 
Chapter 1 are not affected by the number of alleles at the locus in question. 
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2.5 Frequency-Dependent Selection 

In all of the above constant fitness values for each genotype have been 
assumed. It is likely in reality that many fitness values are not constant but 
depend on the number of individuals in the population, on the frequencies 
of the various alleles, or on both. In this short section we consider briefly 
some aspects of frequency-dependent selection. We assume the model of 
Section 2.2 with two alleles at the locus considered. 

Using the fitness scheme (1.25a) we arrived at the equation 

Ax = x(l — x){w \\x + 1C12 (1 — 2x) — 1x22(1 — x)}/Tx, 

and this equation continues to hold if the Wij are functions of the allele 
frequency x. Clearly there are equilibria when x = 0, x — 1, or when 

w\\x + 1x12(1 — 2x) — 1x22(1 — x) = 0. (2.25) 

If the functions Wij are sufficiently complex functions of x, (2.25) can have 
a number of solutions, several of which can be stable. There is little point 
in considering special cases. Further, ix need not be maximized at an equi- 
librium point of the system. (2.25) and the equation dw/dx — 0 show that 
mean fitness will not be maximized at an equilibrium if, at that equilibrium, 

x 2 dwu/dx + 2x(l — x)dwi 2 /dx + (1 — x) 2 dw 22 /dx ^ 0. 

Thus evolution can cause a steady decrease in mean fitness. In a classical 
example due to Wright (1948) it is supposed that the fitnesses of A\A\, 
A 1 A 2 , and A 2 A 2 individuals are 1 — s + £(1 — x), 1, and 1 + s — £(1 — x) 
where s, £ > 0. If s < £ there is a point of stable equilibrium where x = 
x* = 1 — s£ _1 , whereas the mean fitness is maximized at \ (| +x*), halfway 
between x* and Clearly, for suitable initial frequencies of Ai, the mean 
fitness can steadily decrease during the course of evolution. 



2.6 Fertility Selection 

Until now we have assumed that selection operates through viability dif- 
ferentials. This assumption was made for mathematical convenience, and 
we now suppose that further selective differences between genotypes arise 
through differential fertility as well as through viability differences. The 
analysis now becomes more complex, since fertility relates to mating com- 
binations rather than single genotypes. Our discussion assumes the natural 
generalizations of the model of Section 2.2 and closely follows the work of 
Bodmer (1965) and Kempthorne and Poliak (1970). We follow the natural 
generalization of (1.25a) and suppose that the viability of an A{Aj geno- 
type is (i,j = 1, . . . , k) (assumed the same in both sexes) and that the 
fertility of an AiAj x A rn A n mating is /^ mn . (We adopt some standard 
ordering convention such that AiAj is the male and A m A n the female.) It 
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is clear that male and female genotypic frequencies will be equal: Let X\j 
be the frequency of AiAj just before the conception of a new generation. 
Those matings leading to A{A{ offspring must be of the form AiAj x AiA m 
for some j and m. Consideration of the genotypic products of such matings 
shows that the frequency of AiAi at the birth of the next generation will 
be proportional to 



Xa — fuuX u + 2 ^ ^ fiiimXaXim T ^ ^ ^ fijiiXij X ^ 
+ !EE fijimXij Xijji . 



(2.26) 



j/2 m^i 



These AiAi individuals are now subject to viability selection between birth 
and the age of maturity, and it follows that the frequency X ! u of AiAi just 
before the birth of the next following generation is given by 



(iX f H = WiiZu, i = l,2, (2.27a) 



where n is a normalizing constant to be discussed later. Similar considera- 
tions for AiAj individuals yield 

(iX[j = WijZij, ij = 1,2,. . . ,M ^ j, (2.27b) 

where 



X%j — ( fiijj "F fj jii) Xa Xj j -j- 2 ^ ^ fiijmXaXj , 



jm 



m^j 



+ E fimj jX im X 33 + 1 EE fimjnXimXj 

m^i n^j 



jn* 



The constant /x in (2.27a) and (2.27b) is now chosen so that Y 2 X[j = 1- 
These recurrence relations are far too complex to solve in general, and we 
make no attempt to do so. Questions concerning the existence and stability 
of equilibrium points of the system (2.27) have been discussed by Hadeler 
and Liberman (1975), but we do not pursue them here. Some simplification 
is possible if it is supposed that the fertilities fijmn are of the multiplicative 
form 



fijmn — & 



ij &mn 5 



{(lij — djii &n 



Introducing the new variables 



^nm) * 



{p'iiXa 2 ^ ^ Xjj ) / ^ ^ ^ ^ & jj X jj , 

j^i 3 <i 

Vi ipiiXa + 2 ^ ^ bjj Xjj ) / ^ ^ ^ ^ bjj Xjj , 

3 <i 



(2.28) 



(2.29) 



the recurrence relations (2.27) become, for the multiplicative case, 
ifX'a = wuXiVi, 

M* X'ij = w ij {x i y j + Xjyi), i ^ j, 



(2.30) 
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where fi* is a new normalizing constant ensuring that the sum of genotypic 
frequencies is unity. Use of (2.29) and (2.30) shows that 

x'i = (a u w u x z y t + \ ^a l] w lJ {x l y J + x j y i ))/'Y^2 a ij w ijXiyj, 

3 & 

Vi = ( pU^aXiUi ~h 2 ^ ^ bjjVJjj T (2.31) 

These recurrence relations are identical in form to those in (2.4), and thus 
the latter system, once appropriate changes in fitnesses have been made to 
include the viability parameters, continue to apply. Some specific examples 
are given by Bodmer (1965). One question of particular interest is whether 
the mean fitness of the system increases with time. Unfortunately it is not 
at all evident that a natural definition for mean fitness exists in the fertil- 
ity selection case. Using (2.30) and the analogy with previous recurrence 
systems, it would be reasonable to define mean fitness as 

T (2.32) 

i i<j 

With this definition, it is possible for mean fitness to decrease with time. 
Thus (Kempthorne and Poliak (1970)) if A; = 2, wn — w \2 — 1, W 22 — 0.5, 
<Ui = a>i 2 = 1? a 22 — 2, bn = 0.25, 612 = 622 = U Xn = -^22 = 0, Ah 2 = 1, 
then Xi = yi = 0.5, and the mean fitness, as defined by (2.32), is 0.875. 
From (2.31), x[ = x f 2 = y[ = 5/11, y f 2 = 6/11 and using these values 
in (2.32) the daughter generation mean fitness is 19/22 « 0.864. It is clear 
that this decrease is caused essentially because the genotype with highest 
fecundity has lowest viability. 

Suppose now that in (2.28), it is assumed that a^- = 6^-. Then immedi- 
ately Xi = yi and that the birth of the new generation genotypic frequencies 
are in Hardy- Weinberg form. Further the recurrence relations (2.31) are of 
the form (2.7), and therefore the conclusions deriving from that system, 
including in particular the result that the mean fitness, defined now as 
Y^^2 a ij w ij x i x jj cannot decrease, continue to hold. The change in mean 
fitness again is approximately equal to the additive genetic variance when 
the latter is suitably defined so as to include both viability and fertility 
parameters. 

Despite this, it is possible that (2.32) is not a natural definition of the 
mean fitness of the infant population. The classical definition is that the 
fitness of any genotype is proportional to half the number of offspring 
individuals (of whatever genotype) from individuals of the genotype in 
question, counting being performed at the same stage of the life cycle. We 
now attempt to find an algebraic definition of mean infant fitness along 
these lines. 

Consider infants of genotype AiAy. These survive to adulthood with 
probability Wij. An AiAj individual mating with an A m A n individual has 
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Q'ijdmn offspring and crediting half of these to the A{ Aj individual and aver- 
aging over all A m A n , the A{ Aj individuals are credited with a proportionate 
amount 



2 Wij&ij ^ ^ ^ — WijdijTflj2w 

m n 

of offspring, where = aijWij and m = J2Yl x i x j a ij w ij- The mean 
fitness of the infant population may then reasonably be defined as the 
weighted average of these quantities, or 

EE-i-i aijim/2w — (m) 2 /2w. (2.33) 

In a parallel fashion the mean fitness of the adult population may be 
defined: Details are given by Kempthorne and Poliak (1970). Curiously 
neither the infant mean fitness, defined by (2.33), nor the adult mean 
fitness, must necessarily increase with time, decreases again possibly oc- 
curring when those genotypes with high fertility have low viability. We do 
not pursue this matter further and simply note the great complexity in 
general of fertility selection models. During most of the rest of this book 
selection will be taken to mean viability selection. This is no more than 
a reflection of the fact that, because the mathematics of viability fitness 
models is easier than that of fertility fitness models, more is known about 
viability selection models. 



2.7 Continuous-Time Models 

In all of this book so far it has been assumed that populations reproduce 
at discrete time points. There are certainly some real-world populations 
for which this is a reasonable assumption. On the other hand, it is some- 
times more appropriate biologically, or simpler mathematically, to use 
continuous- time models in which births and deaths can take place at any 
instant. This normally leads to mathematical systems where changes in 
gene frequency are described by a differential equation or by differential 
equation systems. In this section we outline some of these mathematical 
models and discuss their properties, relying heavily on the definitive work 
of Nagylaki (1974c, 1976), Nagylaki and Crow (1974) and Kimura (1958). 

Consider a locus “ A ” in a monoecious population and let this locus admit 
alleles Ai, . . . , A&. At a given time let the number of A{Aj individuals be 
riij , where we adopt an ordering notation such that the Ai gene has derived 
from the male parent. Define rii by m — ^ J2i n ij + n ji) : Then 2 rii is the 
number of Ai genes in the population. If N — ^ rii is the population size 
we may write 



X{ rii/ N , — n ij / N 



(2.34) 
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as the frequencies of A{ and the (ordered) genotype AiAj, respectively. 
Consider a continuous-time deterministic process of population change in 
which, if terms of order ( St ) 2 are ignored throughout, NXijdijSt individuals 
of genotype AiAj die in the time interval (t,t + St). Let MSt be the number 
of matings during this time interval, X^^j be the fraction of these matings 
which are of the (ordered) type AiA m x A n A 7 -, and di m , n j the number of 
offspring from such a mating. We introduce the standardized parameter 
aim,nj = Mdim^nj/N, so that N Xim^jaim^njSt is the number of offspring 
from all (ordered) AiA m x A n Aj matings in the time interval (t,t + St). 
Defining as the number of AiAj individuals in the population at time 
t and noting that AiAj individuals can arise from various ordered matings 
in various frequencies, we get 



fT'ijit T St) 71 ij(t ) T St | ^ ^ N Xim,njQ'im,nj dijTlij (t) j . 

\m,n / 

Letting St — » 0 in the usual way, we obtain 

flij — ^ ^ N Xjm,nj&i7n,nj dijTlij, (2.35) 



where the time derivative, here and below, is denoted by a superior dot. 
This equation and the verbal description leading to it form the basis of the 
model we shall consider. 

It is convenient to define a birth-rate for AiA m individuals. Noting 
that the number of offspring (of whatever genotype) to such individuals 
acting as first partner in an AiA m x A n Aj mating during (t,t + St) is 

N ^2 Xim,njaim,njSt and that the number of AiA m individuals available to 
nj 

act as parents is n^ m , it is reasonable for us to define the birth-rate for 
such individuals by the equation 



i, bim, — N V X ir . 



°im u im — ^ yv im,njQ j im,nj 

n,j 



(2.36) 



From this, the fecundity 6*, mortality di, and “Malthusian parameter” m* 
of the allele Ai are defined by 

^ib{ — ^ ^ ^ij i ^a^i — ^ ^ X jj djj , 77li b{ di . (2.3T) 

3 3 



The mean fecundity 6, mortality d, and Malthusian parameter m are then 
given by 

b = ^^ x ibi, d = fh = b — d. (2.38) 

Equations (2.35)-(2.38) jointly yield 



N = mN, 



(2.39) 
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and 



•C-E- v ' 



i t j — (d l3 T uijX- t 



IJ 7 



(2.40) 



xi — Xi(rrii — m). (2-41) 

To make further progress it is necessary to make certain assumptions. We 
assume first that random mating obtains, so that 



Xim,nj — Xr 



X „ 



im yv nj 



and that ai n , n j can be expressed in the additive form 



(2.42) 



— Pirn T Pi 



nj 



(2.43) 



for some set of parameters {/%}. Equation (2.43) is the natural analogue for 
continuous-time models of an equation like (2.28) for discrete- time models. 
Equations (2.37)-(2.43) then lead to 



^im y nj b (frzm ^) 4“ (p n j b'j 



so that 



Xij = XiXj(bi + b 3 - b) - ( dij + m)Xij . (2.44) 

Perhaps the most important question to ask is whether Hardy- Weinberg 
frequencies hold in this model. Defining Qij = X^ - XiXj as a measure of 
departure from Hardy- Weinberg, (2.41) and (2.44) yield 

Qij X{Xj (g?2 + dj dij d) ipij + xrpQij . (2.45) 

Suppose that di+dj—dij —d ^ 0. Then even if Hardy- Weinberg frequencies 
obtain initially, (2.45) shows that they do not persist and do not hold at 
an equilibrium of the system (2.35). One particular consequence of this is 
that the rate of change of mean fitness is not necessarily approximately 
equal to the additive genetic variance in fitness. It is of some interest to 
determine the relationship between the two quantities, and we now do this 
in the simple special case where the quantities and dij (which are 

functions of the Xi m ^ n j and of time) are adjusted so that the Malthusian 
parameter (= bij — d^) of the genotype AiAj is constant in time. 

To find the additive genetic variance we minimize the quantity 5, defined 
by 

S = ^2^2Xij(rriij - m - Qi - ay) 2 . (2.46) 

If Hardy- Weinberg frequencies do obtain, so that = x^j , this would 
be done following the lines of the analysis in Section 2.4. To measure 
the effect of departure from Hardy- Weinberg frequencies we introduce the 
parameters 0^-, defined by 



Xij — XiXjOij. 



( 2 . 47 ) 
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Clearly dij = 1 implies that Hardy- Weinberg frequencies obtain. If we 
insert (2.47) into (2.46), we find that the minimization equations yield 



or 



XiOL< 



+ y^ xiXjOi 



3 



Y, XiXjOijirriij - m) 
3 



(2.48) 



CZi + ^ ^ XjOjj OLj — YtXfrjOtj — , 



where we define 



(2.49) 



Qij — TTlij 772, &i — X^ ^ ^ X jj &jj • (2.50) 

3 

Further, the additive genetic variance, being the sum of squares removed 
by this procedure, is 

&a — 2 ^ ^ XidjOL-i , (2.51) 

i 

where a\ is defined explicitly by (2.50) and a* implicitly by (2.49). In view 
of (2.41) this may also be written 



<y\ — 2 XiCti. (2.52) 

i 

We turn now to the rate of change of the mean fitness m. By definition 

ffi = yym ij x ij 

and since under our assumptions the rriij are constant, 

— ^ ^ djj {xjXjOjj T XiXjOij + X{Xj Oij ) 

Q>ijX% X j 0 ij + EEw A (2.53) 

— 2 ^ ^ X{ ^ ^ ClijXjOij + ^ ^ ^ ^ CLijXiXjOij 

i 3 

— 2 ^ ^ x ^ {pi 4” ^ ^ ^ j® 4~ ^ ^ ^ ^ dijXiXjOij 

i 3 

— ® A 4~ 2 ^ ^ ^ XiXijOj + ^ ^ ^ ^ O'ijXiXjQ'ij . 

We wish to simplify the final two terms in (2.54). Now 

Xj = ^ ^ XjXjOjj 

i i 



(2.54) 
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so that 

^ ^ %i$ij = 1 • 



Differentiating with respect to £, 

-f ^ = 0 for each j. 

i i 

Thus the second term in (2.54) can be written 

2 y ^ y ^ xiXjOLjOij y ^ y ^ xixj (o^ h~ o,j^ 0 ij . 

The final two terms in (2.54) thus become 

y y ~ a i ~ °tj) x i x j6ij = /Qij), 

where 5{j = dij — a* — aj is a measure of nonadditivity in the Malthusian 
parameters . We conclude that 

m — a\ + T T XijSijdilogOi^/dt. (2.55) 

Thus the rate of increase of mean fitness is equal to the additive genetic 
variance in general only if Hardy- Weinberg frequencies hold (which, as we 
have seen in our model at least, they do not) or if the Malthusian parameter 
is additive (m^ = ai + ctj). A more general and more important conclusion, 
with rriij no longer kept constant, is given by Kimura (1958). 

How important then are departures from Hardy- Weinberg frequencies? 
In our model (2.45) shows that departures will be negligible after some 
time has passed if di + dj — dij — d = 0. But there is another circumstance 
under which departures will also be negligible. Suppose that the deviations 
bij —b and d^ — d are all of order s, where s is a small parameter. Then 
Nagylaki (1976) has shown that the deviation Qij defined above changes in 
time (according to (2.45)) in such a way that after a small time period t\ 
(an explicit formula for which is given by Nagylaki), Qij differs from zero 
only by a term of order 8, even though at that time the gene frequencies 
themselves may be far from their equilibrium values. After time 2 £, the 
rate of change of Qij is of order s 2 . When this occurs a state of “quasi- 
Hardy- Weinberg” (QHW) is said to obtain. In this case departures from 
Hardy-Weinberg frequencies may be trivial, and as a consequence the mean 
fitness increase theorem should hold to an excellent approximation. More 
exactly, under the assumptions we have made, the term o\ in (2.55) is 
of order s 2 , and when QHW obtains the final term is of order s 3 . Thus 
the first term on the right-hand side will dominate the second, leading, 
as noted, to the essential accuracy of the theorem. The only exception to 
this rule occurs when the various frequencies are close to their respective 
equilibrium points: Since a\ — 0 at equilibrium, it is possible that near 
equilibrium a\ is smaller than the final term in (2.55). This is probably 
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of minor importance, and during the period of substantial change in gene 
frequencies the MFIT is effectively true. 



2.8 Non- Random-Mating Populations 

In this section and the next we consider properties of the discrete-time 
models considered above, focussing attention on the case where random 
mating is no longer assumed. In this section we consider calculations as- 
sociated with the one-locus version of the mean fitness increase theorem 
(MFIT) and in the next on calculations associated with the Fundamental 
Theorem of Natural Selection (FTNS). In both sections we use a notation 
that generalizes readily to the multilocus extensions considered in later 
chapters. 

Suppose that fitness depends on the genotype at one locus only, at which 
occur alleles A 2 , . . . , Ak . Any form of mating is allowed, random or 
otherwise. We denote the frequency of the (ordered) genotype A U A V at the 
time of conception of any generation of individuals by X uv (= X vu ), so 
that the frequency x u of the allele A u is given by x u = X uv . 

We assume that the genotype A U A V has (viability) fitness w uv . The mean 
fitness w of the population is then given by 

mean fitness = w = EE wJuv^uv (2.56) 

U V 

The additive genetic variance in fitness is found by the non-random-mating 
generalization of the procedure that led to the “random-mating” expression 
(2.17). That is, it is found by minimizing the function 5, now defined more 
generally than in (2.15) as 

S — y ^ ^ ^ X uv {ix uv W Ot-u £^u) 7 (2.57) 

subject to the constraint 



^rp u a„ = 0. (2.58) 

U 

The values of aq, a 2 , . . found through this minimizing procedure, 
that is the average effects of the alleles Ai, A 2 , ... , Ak, are the implicit 
solutions of the equations 

'^U^U "F ^ ^ X U yQty ^ 1 7 2, . . . , A), (2.59) 

V 

where a u , the average excess of the allele A u , is given by 

= x u 'y X uv {w uv w). 



(2.60) 
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Equation (2.59) shows that, under random mating, the average effect a u of 
A u and the average excess a u of A u are equal, since under random mating 
the second term on the left-hand side of (2.59) is 0. When mating is not 
random, a u and a u are, in general, different from each other. 

Standard regression theory shows that the sum of squares removed by 
fitting the oy values in (2.57), that is the additive genetic variance a \ , is 
given by 

® A ^ ^ ^ x u^u^u' (2.61) 

u 

With the definition of a u given in (2.60), the change Ax u in the frequency 
of A u between consecutive generations is 

Ax u x u d u jw , (2.62) 

so that an alternative expression for the additive genetic variance is 

g\ — 2 w a u Ax u . (2.63) 

u 

Similarly an alternative set of formulas implicitly defining the quantities 
{««} is 

3'uQ'U ""t"" H 2, • • • 7 (2.64) 

V 

If we define D as a diagonal matrix whose uth term is x u , P as a matrix 
whose (w,f)th term is X uv , A as a vector of the Ax u values and a as a 
vector of the a u values, this equation can be written in matrix and vector 
form as 



(D + P)a = u) A. (2.65) 

When this matrix form is used, the extension of the definition of the a u to 
the multilocus case in Chapter 7 will be almost immediate. 

An explicit solution of the equations in (2.59) for the a u values is not 
in general possible. However in the two-allele case an explicit solution of is 
straightforward. For this case we get 

oi-u — a u x l ^2 /{ W i A"i2 + 2 X 11 X 22 + ^ 12 X 22 }, u = 1,2. (2.66) 

Under random mating X uv = x u x v , and this equation confirms that in this 
case a u and a u are equal. The equation also shows that under non-random 
mating, a u and a u have the same sign and are zero or nonzero together. In 
the two-allele case Fisher often described a 2 - a\ as the average effect of 
replacing A\ by A 2 > but in the k allele case, to which we now return, the 
definition of a u simply as the average effect of A u is rather more flexible. 

We now consider the change in mean fitness from one generation to 
another. We write 



w uv — w + a u + a v -f e UVj 
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and with this definition, (2.58) implies that 



EE 



X. 



— 0 . 



(2.67) 



The frequency of A u at the birth of any given generation is ^ X uv , and in 

V 

the next generation at birth it will be Ylj X uv w uv /w. Thus the change in 
mean fitness between consecutive generations becomes 

— ^ ^iuvi a u + a v + £uv) 

= 2 'y ^ otu x u ~t~ ^ ] 'y '(X uv + AX uv )e uv (2.68) 

u 

= 2 ^2 a«(Ax„) + (from (2.59)) and (2.67) 

U 

= a\/w + ^2 (from (2.61) and (2.62)). 

If the second term on the right-hand side of this expression is small, the 
conclusion of the mean fitness increase theorem approximately applies. 



2.9 The Fundamental Theorem of Natural 
Selection 

We now turn to the Fundamental Theorem of Natural Selection (FTNS), 
considering first the discrete-time version, and later the continuous-time 
version, of this theorem. 

Equation (2.58) shows that Y^ u Su X U v{®u + ®v) = 0, and from this the 
mean fitness w may be written in the form 

» = EE X uv (w + a u + a v ) . (2.69) 



In the FTNS, Fisher considered the change in mean fitness from one gen- 
eration to another only through changes in the frequencies X uv in the 
expression (2.69), with the quantities w,a u and a v being kept constant. 
This is called the “partial change” in mean fitness, and we denote it by 
Ap(tD). If X' uv is the frequency of the (ordered) genotype A U A V in the 
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daughter generation, this partial change X p (w) is 

Ap(d)) = ^ '^>2(X f uv — X uv )(w + a u + a v ) (2.70) 

U V 

~ y ^ y y ^-uv — x uv )(otu 

U V 

= 2 E a -E( x -- x -) 

u j 

= 2J2®u&Xu (2.71) 

u 

= a\/w. (2.72) 

The final step in this sequence comes from (2.63). 

We call the interpretation of the FTNS in the above form the “Price” 
interpretation, since it was first given by Price (1972). This interpretation 
follows the spirit of the wording in Fisher (1930, 1958). 

Thus the partial change in mean fitness is exactly equal to cr\/w, and 
this is the one-locus statement of the FTNS. Thus, as asserted by Fisher 
(1930, 1958), the FTNS is an exact result, implying no approximations, and 
it applies to non-random-mating as well as random-mating populations, 
since no assumption about the mating scheme is made in the analysis. We 
extend the FTNS as an exact result in Chapter 7 to the case where fitness 
depends on an arbitrary number of loci, up to and including all those in 
the entire genome, under any form of mating, random or otherwise. 

An alternative way of writing the FTNS in this interpretation is 

A p (w) = ^ yy(AX ul) )(w; ut ,) a = a\fw. (2.73) 

U V 

Here (w uv ) a — w + a u + a v may be thought of as the best estimate of the 
fitness of the genotype A U A V as predicted from the alleles in that genotype. 
In this form the Price interpretation bears an interesting similarity to a 
second interpretation to the FTNS, one which is closer in spirit to the 
wording in Fisher (1941), and which was developed by Lessard (1997). 
Lessard’s interpretation uses a concept of partial change different from, 
although mathematically equivalent to, that in the Price interpretation. In 
the Lessard interpretation the actual fitness w uv of the genotype A U A V is 
retained, but the change in genotype frequency is replaced by a “alleles 
derived” value. More explicitly, the statement of the theorem under this 
interpretation is that 

A p (t/>) = yy y^(Ax uv ) a w uv = a 2 A /w , (2.74) 

U V 

where ( AX uv ) a is defined by 

(AX uu ) q = + a T (2.75) 

W 
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(AX^) a is not the actual change in the frequency of the genotype A U A V 
from one generation to another, but is thought of the change as predicted 
from the alleles A u and A v in that genotype. The similarity of the forms 
of the middle terms in (2.73) and (2.74), and the identity of the right-hand 
sides, together indicate the mathematical identity of the two concepts of 
partial change. The difference between the two concepts is in the interpre- 
tation: In the first interpretation the genes in a genotype may be thought of 
as assessing the genotype fitness, while in the second they may be thought 
of as assessing the change in the frequency of that genotype. 

The background to Lessard’s interpretation of the FTNS is as follows. 
Fisher (1941) discussed in some detail the circumstances under which the 
equation 



X" ? ; 7 < 



+ 



AX 



X„ 



VV 2 



x u 



(2.76) 



will hold for all u and v. If these equations do hold for all u and u, then 
AX UV /X UV can be expressed in the form 



AX^ 

X U y 



— Pu + (3 Vl 



(2.77) 



for some set of constants /3i, /?2> • • • > From this, 



X UV (/3 U + I3 V ) = AX UV . (2.78) 

Summation in this identity over all v gives 

x u (3 u + X UV (3 V = Ax u for all u. (2.79) 

V 



Equation (2.64) then shows that we may take j3 v — a v /w for all u, where 
a v is the average effect of A v . It follows from (2.77) that 

AX UV = (2.80) 

U) 

Comparison of this equation with (2.75) shows that when all the equations 
of the form (2.76) hold, the actual change genotype frequency (2.80) is 
identical to the change as assessed by the alleles in the genotype. This 
implies that the total change in mean fitness is equal to the partial change 
defined in both equation (2.72) and equation (2.75). 

However, equation (2.76) will hold only under very restrictive mat- 
ing conditions. The random-mating case is perhaps the most important 
of these. Under random mating the equation X\ v = 4X UU X VV holds, 
so that 21ogX uv = log 4 + logX nu + \ogX vv . From this, 2AlogX uv = 
A log X uu + A log X vv . If small-order terms are ignored, so that A log x can 
be replaced by (Ax)/x, equation (2.76) then follows. More generally the 
conclusion still follows, to this level of approximation, if X^ v = XX UU X VV 
for any fixed constant A. Again ignoring small-order terms, it follows that 
the restriction X\ v = AX^X^ is required for the total change in mean 
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fitness to be predictable from parental generation genotype frequencies and 
fitnesses. The point of the FTNS is that random mating is not required for 
the theorem to hold, so that (2.76) does not necessarily hold. Then the 
total change in mean fitness is not predictable unless the mating scheme is 
known. Despite this, the FTNS holds whatever the mating scheme might 
be, and whether it be known or unknown. 

It is straightforward to give also a continuous- time version of the FTNS. 
This shows that the continuous- time partial rate of change in mean fitness, 
defined as 

££( d" a v) (2.81) 

U V 

is exactly equal to the additive genetic variance. We do not provide the 
details since the closely follow those in the discrete- time case. 

What biological relevance does the FTNS have? There are two points to 
raise here. First, the restrictive assumptions made in the theorem should 
be noted. Matters such as geographical dispersion, the existence of two 
sexes, stochastic changes in gene frequency in finite populations, and so 
on are ignored. On the other hand fertility selection is handled by Lessard 
and Castilloux’s (1995) extension of the theorem to that case. Second, 
Fisher viewed the partial change in mean fitness as that change brought 
about by natural selection. It is not clear how this interpretation can be 
sustained, and it is possible that the MFIT, even though it is restricted 
to random-mating populations and, as we show in the following section, 
might not hold when fitness depends on a two-locus and more generally a 
multilocus genotype, nevertheless gives a greater biological insight into the 
evolutionary process than does the FTNS. Associated with this view is the 
approach, initiated by Nagylaki (1974c), which delimits the circumstances 
under which the MFIT is approximately true. 



2.10 Two Loci 

So far in this chapter we have assumed that the fitness of any individual 
depends on his genetic constitution at a single locus. This is of course only 
an initial simplification: We have already noted in Chapter 1 that for some 
questions, for example, the evolution of recombination rate, a more com- 
plicated theory is required. We now introduce briefly the case where fitness 
depends on the genetic constitution at two loci, deferring a more complete 
treatment to Chapter 6. Although such a “two- locus” theory may often be 
little more realistic than “single-locus” theory, it does allow at least two 
advances to be made. First, some assessment can be made of the accuracy 
of approximating two-locus behavior and measurements by combining two 
single-locus results. Second, no assessment of the evolutionary importance 
of linkage between loci can be made without at least a two- locus analysis. 
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For convenience we assume viability selection only, random mating and 
discrete nonoverlapping generations. Consider two loci “A” and “B ” at 
which occur alleles Ai, A2 and B 1, £2, respectively, and let the recombi- 
nation fraction between the loci be R (0 < R < 0 . 5 ). (When R = 0 the 
two loci in effect become one locus, the theory of which has already been 
considered. This is why we impose the assumption R > 0 .) It is conve- 
nient conceptually to suppose that these loci are on the same chromosome: 
The unlinked case (R = 0 . 5 ) may be treated by imagining the distance 
along the chromosome between the two loci to be so long that the recom- 
bination fraction between them is 0 . 5 . We then use the words gamete and 
chromosome interchangeably in what follows. 

It is possible to write down recurrence relations connecting the (ten) 
zygotic frequencies (of A\Bi/A\Bi, A1B2/A1B1 , . . . , A2IB2/A2B2). These 
relations show that a simpler set of recurrence relations can be found for 
the frequencies of the four gametes A\B \ , A1B2 , A2B1 and ^2^25 called 
here gametes 1 , 2 , 3 , 4 , respectively. This simplification arises through the 
concept of the random union of gametes and is parallel to treating gene 
frequencies rather than genotypic frequencies at a single locus. 

We consider first the case where there is no selection. The gametes form- 
ing the zygotes of any generation may be thought of as being drawn 
randomly from a pool containing gametes of type 1-4 in certain pro- 
portions. These gametes will not necessarily be passed on to the next 
generation of gametes in the same proportions since, for example, there will 
be a decrease in the frequency of A1B1 gametes through recombination in 
A\B\j A2B2 individuals which might not be exactly counterbalanced by an 
increase through recombination in A\B<ij AvB\ individuals. If the frequency 
of gamete i is denoted c\ (i — 1 , . . . , 4 ), these arguments and some straight- 
forward calculations show that the frequencies c\ in the next generation are 
given by 



4 = Cl + R{c 2 c 3 - C1C4), 
<4 = C 2 — R(c 2 C 3 — C1C4), 
C3 = C3 — R(c 2 c 3 — C1C4), 
C4 = C 4 + R{c 2 c 3 - C1C4), 


( 2 . 82 ) 


or more economically as 




C- = Ci + T)iR(c 2 C 3 - C1C4), 


( 2 . 83 ) 


where 




Vi=V 4 = 1, m = m = - 1 - 


( 2 . 84 ) 



Several conclusions can be drawn immediately from these equations. First, 
since + d 2 — c\ + C2 and c[ +03=01+03, there is no change in the 
frequencies of A\ and B\ This confirms, fortunately, the one-locus analysis 
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of Chapter 1 . Second, elementary algebra shows that 

4 C 4 - 44 = (1 - #)(ciC 4 - c 2 c 3 ), (2.85) 

so that since R > 0, 

Ci{t)c 4 (t) — C2{t)cs(t) ->> 0 as t — > oo. (2.86) 

It follows that under the assumptions we have made, in particular that of 
no selection, we may reasonably assume that the equation 

cic 4 - c 2 c 3 = 0 (2.87) 

holds if the population has evolved for some time. It is important to estab- 
lish what this equation means in genetical terms. Algebraic manipulation 
shows that (2.87) is equivalent to 

freq(AiBj) = freq(A^) x freq (Bj) (2.88) 

for all possible pairs ij. When (2.88), or equivalently (2.87), holds, the 
population is said to be in a state of linkage equilibrium with respect to 
these loci. The quantity C 1 C 4 — C 2 C 3 , which we denote by D, is often called 
the “coefficient of linkage disequilibrium” . As we see below, this can be a 
rather misleading expression for the quantity C 1 C 4 — c 2 c 3 , which we would 
prefer to call the “coefficient of association” . An alternative expression for 
D, sometimes more useful than C 1 C 4 — C 2 C 3 , is 

D = ci — freq. A\ x freq. B\. (2.89) 

We turn now to the case where selective differences between genotypes 
exist. In the previous chapter we used a fitness display such as that in 
(1.92), which focusses attention on the genotypes at each of the two loci. 
For theoretical purposes, however, it is usually more convenient to adopt a 
notation focussed around the two gametes making up each individual. This 
is so since, as (2.82) shows, gametic frequencies are the most natural vehi- 
cle for studying evolutionary behavior in two-locus systems under random 
mating. We thus adopt the fitness scheme shown in (2.90) below: 





A& 


A\B 2 


A2B1 


A2B2 




Wu 


W12 


W13 


Wi A 


AiB 2 


W21 


^22 


^23 


w 2A 


A 2 Bi 


W31 


W32 


^33 


W34 


A2B2 


W A1 


W42 


W43 


W44 



In the notation of this fitness scheme the fitness of zygotes made up of 
gametes i and j is written as Wij (which we assume equal to wji). If coupling 
and repulsion double heterozygotes have the same fitness, then also W 23 = 
W 14 . We make this assumption throughout. If, for specific purposes, we 
wish to adopt a fitness display emphasizing single-locus genotypes, (2.90) 
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becomes 





B X B X 


B\B 2 


b 2 b 2 


AtA! 


w u 


W12 


w 22 


A\A 2 


W13 


U >14 


W 2 4 


a 2 a 2 


W33 


W34 


W44 



The marginal fitness wi of gamete i is defined by 

Wi = '^TcjW i j, (2.92) 

3 

and the mean fitness w of the population then becomes 

w = EE CiCjWij — ^ ^ CjWi . (2.93) 

Consideration of all possible matings, their frequencies, and their genetic 
outputs, as well as the fitnesses of the various genotypes, shows that the 
gametic frequencies c- in the following generation are given by 

c'i = (ciWi + T]iRwi4(c 2 c 3 - C1C4)) , i — 1 , 2 , 3 , 4 . ( 2 . 94 ) 

Here rji is defined in (2.84). If the Wij are all equal, these recurrence rela- 
tions reduce to (2.83). These important equations are due in this form to 
Lewontin and Kojima (1960), but they were essentially derived earlier, for 
a continuous-time model, by Kimura (1956b).. Our present aim is to discuss 
some of the more immediate consequences of these equations. 

First, the mean fitness, as defined in (2.93), is similar in form to the 
definition (2.10) with k = 4. It follows from the discussion in Section 2.4 
that if we assume that mean fitness is maximized at a unique internal 
(ci > 0) point, then at this point Wi = w, where now Wi and w defined by 
(2.92) and (2.93). What is the connection between this maximization point 
and the equilibrium points of the system (2.94)? The equations c- = c* 
show that the system (2.94) is in equilibrium when 

w = Wi + c~ 1 rjiRwi 4 (c 2 Cs — C 1 C 4 ), i — 1...4. (2.95) 

Unless linkage equilibrium holds at the equilibrium point, this point cannot 
be a point of maximum fitness. We show later that linkage equilibrium holds 
at equilibrium only in special cases, so that mean fitness can decrease in 
the system (2.94). The MFIT cannot then be true in general in two-locus 
selection systems. By contrast, we shall show in Section 7.4.5 that the 
FTNS does hold with a multilocus fitness scheme, and thus in particular 
with a two- locus fitness scheme. 
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We now demonstrate the possible decrease in mean fitness by a numerical 
example. Suppose, using the notation (2.91), that the fitness scheme is 





BiB x 


B 1 B 2 


b 2 b 2 


A\A\ 


1.000 


1.024 


1.021 


A\A 2 


1.025 


1.066 


1.026 


a 2 a 2 


1.018 


1.019 


1.007 


so that j 


4 and B loci unlinked. 


If initially 


.168, c 2 


= 0.362, c 3 


= 0.292, 


c 4 = 0.178, 



(2.96) 



(2.97) 



the population mean fitness is 1.033106. The mean fitness now decreases 
for about 14 generations and after that steadily increases, reaching a value 
of 1.031212 at the equilibrium point 



ci = 0.24136, c 2 = 0.28164, c 3 = 0.22192, c 4 = 0.25508. (2.98) 



The net effect of the evolution of the population from the starting point 

(2.97) to the equilibrium point (2.98) is to decrease mean fitness by 
0.001894. At this equilibrium point the value of D — cic 4 — c 2 c 3 is 
-0.000935. 

Apart from the fact that mean fitness can decrease, the above analysis 
demonstrates two further points. The first is that the coefficient of linkage 
disequilibrium can be nonzero at an equilibrium point of the evolutionary 
system, even though the two loci upon which fitness depends are unlinked. 
This is why we prefer the term “coefficient of association” for the quantity 
Cic 4 - c 2 c 3 , rather than the term “coefficient of linkage disequilibrium”. 

The second point to observe is that the location of the equilibrium point 
or points of (2.94) will depend on the recombination fraction R between the 
loci in those cases where linkage equilibrium does not obtain at equilibrium. 
Thus various values of R can be considered and the equilibrium mean 
fitnesses computed for each. When R = 0 the “equilibrium” equation (2.95) 
and the “maximization” equation w = Wi (i = 1, ... ,4) agree, so that if 
each Ci > 0 at equilibrium, the value of R for which the greatest equilibrium 
mean fitness is achieved is for R = 0. This conclusion remains true if some 
of the Ci are zero at equilibrium but strangely, as we see later, it is not 
necessarily true that equilibrium mean fitness is a monotonically decreasing 
function of R. To the extent that equilibrium mean fitness is maximized 
for extremely tight linkage, the argument of Fisher given in Chapter 1 
concerning the evolution of tight linkage between epistatic loci is justified. 
This argument can be made only when D ^ 0 at equilibrium for all R 
values: If D = 0 at equilibrium for all R the equilibrium mean fitness is 
independent of R. 

The third topic we treat, at rather greater length, concerns the additive 
genetic variance in fitness. We are particularly interested in the relationship 
between this and the two marginal single-locus values, and we begin by 




72 2. Technicalities and Generalizations 



defining the latter. Using the fitness scheme (2.91), we may define the 
marginal fitnesses of the various single- locus genotypes as follows: 



Genotype 


Frequency 


A 1 A 1 


(ci + C2) 1 


A 1 A 2 


2(ci + c 2 )x 




(C3 + C4) 


A 2 A 2 


(C3 + C4) 2 


B 1 B 1 


(ci + c 3 ) 2 



Marginal Fitness 

(wnc( + 2wi2Cic 2 + w 2 2C2)/(ci + c 2 f = Uu 
(1C13C1 C3 + 1C14C1 C4 + W14C2C3 + W24C2C4)/ 

(ci + c 2 )(c 3 + C 4 ) = «12 

(^33C3 + 21C34C3C4 + rc 4 4C4)/(c 3 + C 4) 2 = U 22 

(2.96) 

(lUllC? + 2 wi 3 CiC 3 + W33Cs)/(ci + C 3) 2 = V 11 



B 1 B 2 2(ci + C 3 )x (tCi 2 CiC 2 + 1C14C1C4 + W14C2C3 + W34C3C4) / 

(c 2 + c 4 ) (Cl + c 3 )(c 2 + c 4 ) = V12 
B 2 B 2 (c 2 + C4) 2 (lC22C2 + 2lC24C2C4 + 1C44C 2 )/(C2 + C4) 2 = V22 

From (1.42), the marginal additive genetic variance at the A locus may be 
defined as 



2(ci + C2XC3 + C4)G^4, 



(2.97) 



where 



Ga = wn(ci + C2) + ^12(1 — 2 ci — 2C2) — ^22(^3 + C4). ( 2 . 98 ) 

Similarly the marginal additive genetic variance at the B locus is 

2(ci 4- C3XC2 4- C4)G 2 B , (2.99) 

where 

Gb — vu(ci + C 3 ) + ^12(1 — 2ci — 2C3) — ^22(^2 + C 4 ). (2.100) 

We now find the two-locus additive genetic variance. To do this we assign 
additive parameters an and a \2 to Ai and A 2 and 0:21 and 0:22 to B\ and 
B 21 and then minimize the expression 

S = c\{wn - w - 2an - 2a 2 i) 2 + 2cic 2 (tei 2 — w — 2a n - a 2 i - a 2 2 ) 2 
H b cl(w44 - w - 2ai2 - 2a 2 2) 2 

with respect to the a^. Now that two loci are involved in the minimization 
is is appropriate to add constraints on the since, for example, adding 
some constant to each a\ x and subtracting the same constant from each a^x 
does not change the value of S. Such a change would, however, affect the 
definitions of marginal additive genetic variances. The natural constraints 
to impose are those which arise automatically in the one-locus case as given 
in (2.58). In the two-locus case these are 

(ci -b c 2 )an + (c 3 + c 4 )ai 2 = 0, (ci + c 3 )a 2 i + (c 2 + c 4 )a 2 2 = 0, (2.101) 

and the minimization is carried out subject to these constraints. Details of 
this procedure are given by Kojima and Kelleher (1961) and Kimura (1965) 
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and are not pursued here. It is found that the additive genetic variance can 
be written as 

2{(ci + C 2 ) (03 + c 4 )H A + 2 H a HqD + (ci + C 3 )(c 2 + c 4 )Hb}, (2.102) 

where H A and Hb are the solutions of the equations 

Ha + {(ci + C 2 XC 3 + C 4 )} 1 DHb = G a , 

Hb + {(ci + C3)(c2 + C 4 )} x DH a —Gb , (2.103) 

and Gs being given by (2.98) and (2.100). 

Several interesting conclusions follow from these equations. Perhaps the 
most important is that if D = 0 (that is, linkage equilibrium between the 
two loci) then H A = Ga, Hb = G#, and the true two- locus additive genetic 
variance is the sum of the two single-locus marginal values. When D ^ 0 
this is no longer true, and there is no simple relationship between this sum 
and the true two-locus additive genetic variance value. This is an important 
conclusion since it seems to be widely assumed in the classical literature 
(see for example Fisher (1918, p. 405), (1958, p. 37) and Wright (1969, p. 
439)) that in a multilocus system the true additive genetic variance can be 
found by simply summing single-locus marginal values. Since we have shown 
above that changes in mean fitness can be negative in two-locus systems, 
and thus cannot be equal to any form of genetic variance, it follows that 

Aw, a A (two-locus), ^ a\ (single-locus marginals) (2.104) 

have in general no clear and obvious connection with each other. This 
conclusion is generalized in Section 7.3.3. 

These conclusions may also be associated with properties of changes in 
gene frequency. Equations (2.97), (2.99), and ( 2 . 102 ) show that 

(two- locus) - a\ (single- locus marginals) = 2 D(GaH b + H A G B ), 

(2.105) 

and if D is small this may be approximated by —4 DGaGb- Since 

A (frequency A x ) = (d + c 2 )(c 3 + c 4 )G A /w , 

with a corresponding expression for A(frequency B ), it is found, if terms 
of order D 2 are ignored, that the left-hand side in (2.105) may be written 

-4Du) 2 A(frequency A\) A (frequency B\) 

(Ci + C 2 ) (C 3 + C 4 ) (ci + C 3 )(C 2 + C 4 ) 

This gives an interesting relationship between the various additive genetic 
variances, the linkage disequilibrium, and the gene frequency changes in a 
two-locus system. If in a certain generation A(frequency Ai) = 0 , then to 
the order of accuracy we use the equation Ga = 0 holds, and the total 
additive genetic variance is simply the marginal B locus value. However, 
this is true only as an approximation and, more precisely, whenever there 
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is linkage disequilibrium between A and B loci there is a small pertur- 
bation from the A locus to the total additive variance, even though gene 
frequencies are not changing at that locus. 

We expect the additive genetic variance to be of importance in discussing 
the correlation between relatives. Before exploring this, we recall that gene 
frequencies alone are not sufficient to describe the evolution of two-locus 
systems, so that it is reasonable to argue that the additive genetic vari- 
ance, which fundamentally involves gene frequencies, is not the appropriate 
component of variance for evolutionary considerations. We thus consider a 
variance defined by gamete frequencies which, since gamete frequencies 
do describe the evolutionary behavior, might be thought to be of greater 
evolutionary significance that the additive genetic variance. 

The marginal fitnesses Wi of the four gametes have been defined in (2.92). 
The total chromosomal, or gametic, variance in fitness, denoted cr G , may 
be defined by 



4 

Oq = 2 5 >, - w) 2 Ci, ( 2 . 106 ) 

i= 1 

the factor 2 being inserted because there are two gametes per zygote. 
Suppose now we attempt to fit the marginal gametic fitnesses by addi- 
tive components depending on the genes on each gamete. This is done by 
minimizing 



c 1 (w 1 —w — an - a 2 1 ) 2 + c 2 (w 2 —w — an — a 22 ) 2 
+c 3 (tc 3 - w - a \ 2 - a 2 \ ) 2 + c 4 (w 4 —w — a i2 - a 22 ) 2 

with respect to an, ai2, a2i and a22, subject to the constraints in (2.101). 
The sum of squares so removed may be described as being due to the addi- 
tive effects of genes within gametes, and for short may be called the additive 
gametic variance. It is found (see Kimura, (1965)) that this is identical to 
the additive genetic variance (2.102) and thus the latter, perhaps unexpect- 
edly, is of use in evolutionary and other considerations. This conclusion is 
generalized in Section 7.3.3. The total gametic variance in (2.106) has three 
degrees of freedom, of which the additive component of it has two. The re- 
maining degree of freedom is taken up by the epistatic gametic variance 
0 £ G , which is 

a EG = 2(uq - w 2 - w 3 + w 4 ) 2 /(c^ + of 1 + C 3- 1 + C 4 l ). (2.107) 

This is zero if and only if an additive genetic fitness scheme exactly fits the 
marginal gametic fitnesses. 

We turn now to the correlation between relatives, restricting attention 
to the case where (2.88) holds, that is that the two loci are in linkage 
equilibrium. This assumption was also made by Fisher (1918). We consider 
both linked and unlinked loci: Fisher’s 1918 analysis is concerned only with 
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the unlinked case. Our treatment is based on Cockerham (1954, 1956) and 
Kempthorne (1954). 

We first isolate various components of the total variance of the character 
measured. Suppose that the measurements for the various genotypes are 





BtB, 


B\B 2 


b 2 b 2 


A\A\ 


mu 


m X2 


mi 3 


A\A 2 


m-2i 


m 22 


m 23 


A 2 A 2 


m 3 i 


m 32 


m 33 



(2.108) 



We form these measurements into a single vector m = (^n,mi 2 ,..., 77133 )'. 
If the frequency of A\ is x and of B\ is y, then since linkage equilibrium is 
assumed, the frequency of A\A\BiB\ is x 2 y 2 , of A\A\B\B 2 is 2x 2 y(l — y ) 
and so on. It is convenient to write these frequencies as the entries in a 
diagonal matrix F, so that 



/*v 



F = 



2x 2 y(l - y) 



0 



\ 



V 0 



(1 — x) 2 (l — y) 2 J 



(2.109) 



Evidently the mean value m in the measurement is given by 

m = x 2 y 2 m u + 2x 2 y(l - y)nii 2 H h (1 - x) 2 (l - y) 2 m 33 . (2.110) 



Further, adopting the notation of (2.96), the marginal means of A\A\, 
A\A 2 and A 2 A 2 are 



u n = y 2 m n + 2y(l - y)mi 2 + (1 - y) 2 m x3 , 

U\ 2 = y 2 m 2 1 + 2y{l - y)m 22 + (1 - yfm 23 , (2.111) 

u 22 = y 2 m 3 i + 2y(l - y)m 32 + (1 - y) 2 m 33 . 

Similarly the marginal means at the B locus are 

vn x 2 mu + 2x(l - x)m 2 i + (1 - x) 2 m 3 i, 

v 12 = x 2 m 12 + 2x{l - x)m 22 + (1 - x) 2 m 32 , (2.112) 

v 22 = x 2 rrii 3 + 2x(l - x)m 23 + (1 - x) 2 m 33 . 

Finally the total variance a 2 in the character measured is 

<r 2 = x 2 y 2 m\ l + • ■ • T (1 — x) 2 (l — y) 2 m 2 s — ffi 2 = m'Fm - m 2 . (2.113) 

This total variance has eight degrees of freedom, and our aim is to break it 
down into the sum of eight components, each having one degree of freedom 
and each being of genetical significance. These components will measure two 
additive variances, one at each of the two loci, two dominance variances, 
one at each of the two loci and the four interaction variances. 

Suppose a matrix T exists such that TFT' = / (or equivalently 
(T') -1 F _1 T _1 = J), where I is the unit 9 x 9 matrix, and define a vector 




76 2. Technicalities and Generalizations 



z by z = TTm. Then 

m'Fm = z'(T')~ 1 F~ 1 FF~ 1 T~ 1 z 
= z'z 

= z 2 T z 2 + • ■ • + Zg. (2.114) 

If the last row in T can be chosen to be (1, 1, . . . , 1), then z 9 = fh and 

<J 2 = z 2 + ^2 + ■ * * + Zg. (2.115) 

The equation TFT 1 — I reduces to the requirement 

x 2 y 2 t il t jl + 2x 2 y(l - y)ti 2 t j2 H + (1 - x) 2 (l - y) 2 t i9 t j9 = <%, (2.116) 

where 5ij = 1 if i = j and Sij = 0 otherwise. The choice t 9 \ = t 92 = . . . = 
£99 — 1 does satisfy (2.116) with i = j = 9. Thus a 2 can indeed be broken 
down into the sum (2.115), where 

Zi = x 2 y 2 tumn+‘2x 2 y(l-y)ti 2 m 1 2 + V{l-xf{l-y) 2 t^m^ (2.117) 

provided that the Uj satisfy (2.116) and the further requirement 

x 2 y 2 tn+2x 2 y(l-y)t i2 -\ \-(l-x) 2 (l-y) 2 t i9 = 0, i = l..., 8. (2.118) 

Apart from these purely mathematical requirements we wish to choose that 
Uj so that the z% have the genet ical interpretations described above. 

Suppose z\ and z 2 are to represent the additive and dominance variance 
components of the character from the A locus. Recalling equations (1.9) 
and using the marginal fitness values (2.111), we would like to have 

z\ = 2x(l - x){xun + (1 - 2x)ui 2 - (1 - x)u 2 2 } 2 , 

z\ = x 2 (l - x) 2 {2u 12 - un - u 22 } 2 . (2.119) 

Such a representation is in fact possible if, in (2.117), we choose 

tn = tu — ti 3 = x^ 1 {2x(l - x )} 1/2 , 

tu = ti 5 = tie = (1 - 2x){2x(l - x)}- 1/2 , 

tn = h 8 = ii 9 = -(1 - x)~ 1 {2x(l - x)} 1 / 2 , (2.120) 

and 

*21 = *22 = *23 = - 2 _1 (1 - X), 

^24 = ^25 = ^26 = 1 , 

t 27 = hs = t 29 = (1 - x)~ l x. (2.121) 

These choices do satisfy the requirements (2.116) and (2.118), and thus 
our desired representation (2.119) is allowable. A parallel procedure gives 
additive and dominance variance components at the B locus as 

z\ = 2y(l - y){yv u + (1 - 2y)vn - (1 - y)v 22 } 2 
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and 

z\ = y 2 { 1 - yf{ 2 v x2 - v n - u 22 } 2 . 

Once more, with the choice of the Uj implicit in these definitions, the or- 
thogonality conditions are met. If z\ is to represent the additive-by- additive 
component of the total variance it would be natural to choose t 5i = tu x tsi, 
and the remaining three interactive components would naturally be chosen 
by similar multiplications. If this is done it is found that all the orthog- 
onality conditions are met, and this also implies that the representation 
(2.115) is completed. We do not go into details here and note only that the 
various components can be expressed as 

(add x add) : z\ — Axy(l - x)(l - y){xyen + x(l — y)e 12 + (1 — x)ye 2 1 

+ (l-x)(l-y)e 22 } 2 , (2.122) 

(add x dom) : z\ = 2x{\ - x)y 2 ( 1 - y) 2 {x(eu - e 12 ) + (1 - x)(e 2i - e 22 )} 2 , 
(dom x add) : z 2 — 2a: 2 (l - x) 2 y(\ - y){y{e n - e 21 ) + (1 - y)(e 12 - e 22 )} 2 , 
(dom x dom) : z\ = x 2 y 2 ( 1 - x) 2 (l - y) 2 {en - e X2 - e 21 + e 22 } 2 , 

where 



en = mu - mi 2 - m 2 1 + m 22 , 
ei 2 = m i2 - mi 3 - m 22 + m 23 , 
e 2 i = m 2 1 - m 22 - m 3 x + m 32 , 
e 22 = m 22 - m 23 - m 32 + m 33 . 



These expressions, given more generally to include the effect of inbreeding, 
were derived by Cockerham (1954). It is sometimes convenient to write 



2 , 2 
= zf + z 3 , 



= z 9 + z: 



4 5 



_2 

a AA 



c 5 i 



&AD = z 6 + ^7 



2 2 
a DD — z 8i 



so that 



— a A + a b + & 2 aa + a AD + cr^D' (2.123) 

A slightly shorter representation collects the final three terms as a single 
term o\ (epistatic variance), but for our purposes this is not useful, since 
the final three terms in (2.123) are involved differently in the correlation 
between relatives, and are therefore best kept separate. 

Consider now the father-son and the full sib correlations in the measure- 
ment. It is possible to write down all 81 father-son genotypic combinations 
and, using a table extending Table 1.1, arrive at a father-son covariance. 
By doing this and a parallel procedure for full sibs, it is found that if the 
A and B loci are unlinked, 

corr(father-son) = {^&a + \° 2 aa) i ^ ? (2.124a) 

corr(full sibs) = (5^ + \a 2 D + \o\ A + \o\ D + j^a 2 DD )/a 2 . (2.124b) 
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Cockerham (1956) demonstrated that, when the two loci are linked, the 
former expression remains unchanged but that the latter must be replaced 
by 

corr(full sibs) = + \ a \> + |(3 — 4 R + 4 R 2 )<J AA 

+ — 2R + 2R 2 )a 2 AD 

+ l(l-2R + 2R 2 ) 2 a 2 DD }/a 2 . (2.125) 

The effect of linkage is always to increase the full sib correlation compared 
to the value for the unlinked case. We derive these formulas later in Chapter 
7 as particular cases of correlations where the trait in question depends on 
an arbitrary number of loci, using a more efficient approach. 

The analysis in this section has assumed a discrete-time model, and it is 
expected that qualitatively similar conclusions would hold for a continuous 
model. One possible complication for such models does, however, occur. In 
the discrete models the frequency of any genotype is found immediately 
from the frequencies of the gametes making up this genotype, so that, for 
example, 

freq^i^Bi^) = 2fieq{A 1 B 1 ) freq(Ai£ 2 ). (2.126) 

In the continuous- time model of Nagylaki and Crow (1974) the existence of 
linkage disequilibrium between the two loci implies that “Hardy- Weinberg” 
equations such as (2.126) are no longer true. This is of some interest since 
many theoretical analyses of continuous- time two-locus models have as- 
sumed the truth of equations like (2.126). However, Nagylaki (1976) has 
shown that when fitness differentials are small a state of “quasi-Hardy- 
Weinberg” soon emerges when genotypic frequencies can, to a very close 
approximation, be found from the constituent gametic frequencies. 



2.11 Genetic Loads 

A genetic load is said to arise if the population mean fitness is less than 
that of some optimal value which in some idealized sense it could take. The 
two forms of genetic load that have caused considerable controversy in the 
literature are the substitutional load and the segregational load. In both 
cases the load £ is defined by 

l = Kax - w)/w , (2.127) 

where re max is the fitness of the most fit genotype and w is the mean fitness. 
If we normalize fitnesses so that the mean fitness is 1, we replace (2.127) 
by 

- 1 . 



'max 



(2.128) 
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Our aim in this section is to analyze the formal calculations for both forms 
of load. These formal calculations have remained implicit rather than ex- 
plicit in the analyses of proponents of genetic loads as calculated by the 
formula (2.128). Before doing this we briefly review the historical context. 

The load concept was introduced by Haldane (1957, 1961) in the substi- 
tutional case. As a result of his load calculations, Haldane placed a quite 
conservative limit on the rate at which favorable new alleles at different loci, 
arising perhaps by mutations or perhaps by an environmental change ren- 
dering a previously unfavorable allele favorable, could spread throughout a 
population. Specifically, he came to the conclusion that as a result of what 
became known as the substitutional load (his “cost of natural selection”), 
substitutional processes at different loci could not start more frequently 
than about 300 generations apart. 

As we observe below, a load in effect refers to a variance in fitness, not 
to a mean fitness. The essence of the substitutional load argument is that 
if many selectively driven substitutional processes are occurring in some 
population at any given time, then there will exist a substantial variance 
in fitness of this individuals in the population of interest at that time. 
Individuals carrying the favored allele at all the loci substituting will then 
have a very high fitness, that is will be required to produce an extremely 
large number of offspring. This is in effect the substitutional load placed 
on the population. 

The load concept was subsequently extended to define a segregational 
load, the motivation being the observation, in the 1 960’s, that there exists 
considerable genetic variation in natural populations. The segregational 
load argument claimed that under a selective explanation for the variation, 
perhaps because of heterozygote advantage at many of the loci exhibiting 
genetic variation, the most fit individuals in the population would again 
have a very high fitness and thus would be required to produce an extremely 
large number of offspring. This led to comments such as that of Dobzhansky 
(1970, page 220), that “higher vertebrates and man do not possess enough 
‘load space’ to maintain more than a few balanced polymorphisms,” lead- 
ing to the view (page 224) that selection favoring heterozygotes “cannot 
explain the polymorphisms observed in man.” At about the same time, seg- 
regational load arguments and subsequently substitutional load arguments 
were used by Kimura (1968) to support his neutral theory of evolution. The 
aim of this section is to show that the (implicit) arguments of Haldane, 
Dobzhansky and Kimura are all unjustified. 

The segregational and substitutional genetic load “problem” arises when 
segregation occurs or substitutions take place at many loci simultaneously. 
The implicit assumption made in load calculations by proponents of the 
load concept is that multilocus fitnesses are obtained by first constructing 
single locus fitnesses and then multiplying these over the loci segregating 
or substituting. We initially make this (surely unrealistic) assumption so as 
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to follow load calculations and arguments, but later discuss more realistic 
fitness models. 

We start with a discussion of the segregational load. This load exists be- 
cause of segregation at a number of loci arising from heterozygote selective 
advantage at each locus. For simplicity we assume two alleles segregating 
at each locus and with a fitness scheme where, at each locus, each homozy- 
gote has fitness 1 — and the heterozygote has fitness 1 + ^s. Thus with 
the multiplicative assumption, and with two loci segregating, the two-locus 
fitness scheme (2.91) would be 





BiBi 


B 1 B 2 


B 2 B 2 


A 1 A 1 


(1 -H 2 


(1 - |s)(l + |s) 


(1 - \s) 2 


AiA 2 


(1 - |s)(l + ^s) 


(l + i s ) 2 


(l-is)(i + is) 


A 2 A 2 


(1-| S ) 2 


(1 - |s)(l + |s) 


(1 -H 2 



With many loci segregating the multilocus fitness scheme is the natural 
generalization of the two-locus fitness scheme above. We emphasize again 
that this model is discussed here since this is the model implicitly assumed 
in load calculations. 

The equilibrium properties of this model are not straightforward. We 
shall see later (see (6.33)) that when the recombination fraction R between 
A and B loci is sufficiently large, the stable equilibrium frequencies of Ai, 
A 2 , B\ and B 2 are all 1/2, and the mean fitness is 1, as a straightforward 
multiplication of single- locus values would suggest. However, when R is 
sufficiently small the picture is more complicated and the population mean 
fitness exceeds 1 at the stable equilibrium point of the system. We defer 
consideration of this case until later and assume for the moment the “loose 
linkage” case. 

More generally, for m sufficiently loosely linked loci and a multiplicative 
fitness model generalizing the two-locus scheme above, the equilibrium fre- 
quencies of all alleles at all loci are 1/2. Any individual is a heterozygote 
at j of these m loci with probability 

(7)<i>" 

so that the equilibrium population mean fitness is 




(2.129) 



An individual heterozygous at all loci has fitness (1 + |s) m , and a formal 
application of the definition (2.128) implies that the segregational load is 



(i+K 



e sm/2 - 1 



(2.130) 
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This can be substantial for large values of m, and this the formal calculation 
directly leads to the segregational load “problem” . 

We return to this calculation below, and turn next to the substitutional 
load. We consider first the substitution process at one single gene locus, 
and initially, to follow formal substitutional load calculations, we do not 
scale fitnesses to make the mean population fitness equal to 1. 

Suppose that at the locus of interest, fitnesses of the form (1.25b) apply, 
with s > 0. It is convenient, and does not materially affect the substance of 
the argument, to assume that h = 0.5. Then because of natural selection, 
the frequency of the allele A\ will steadily increase in the population. When 
the frequency of A\ is x the population mean fitness is 1 + sx, and the load 
as defined by (2.128), is s(l — x). The overall substitutional load L for the 
entire substitution process is defined as the sum of this quantity during the 
process when x increases from a small value x\ (at time t\) to a value x 2 
close to unity (at time £ 2 ). Thus 

L = ^2s(l-x) 

*2 

~ j s{ 1 — x ) dt 

ti 

— 2 J x~ l dx from (1.27) 

Xi 

= 21og(x 2 /xi). 

Since x 2 is close to 1, this differs only trivially from —2 log x\. Unfortunately 
the value chosen for x\ will depend to a large extent on the view one takes of 
the most likely form of genetic evolution, and the discussion in Section 1.7 
becomes relevant to the argument. A value often chosen for evolutionary 
load arguments is x\ — 0.0001, and this gives L — 18.4. When h ^ 0.5 
the load as calculated using this form of calculation usually exceeds 18.4, 
and for operational purposes the “representative value” L — 30 is generally 
used in the load argument. We therefore adopt this value also. 

What does this calculation mean for the offspring requirement of the 
individuals in any given generation? Suppose that all selection is through 
viability differences and the number of reproducing adults in each gener- 
ation remains constant at N . A considerable proportion of the depletion 
in population numbers between birth and the age of reproduction is non- 
genet ic. Taking only the genetic component, and supposing there is no 
depletion through genetic deaths of the optimal genotype A\A\, a straight- 
forward calculation shows that when the frequency of A\ is x, there must 
be iV(l + s)/(l + sx) individuals at birth, so that after differential viabilities 
operate there are N individuals at the age of maturity. Thus the average 
individual is required to leave approximately 1 + s(l — x) offspring after 
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non-genetic deaths are taken into account, so that there will be Ns( 1 — x) 
“genetic deaths” in each generation associated with the evolutionary pro- 
cess. Summed over the entire process this gives NL individuals in all. If 
each substitutional process takes T generations, this implies an average of 
NL/T such “deaths” in each generation. 

Consider now a sequence of loci at which substitutions start regularly 
n generations apart. For convenience it is assumed that the same fitness 
parameters apply for all these loci as for the single locus discussed above. 
As in the segregational load argument, it is implicitly assumed in load 
arguments that fitnesses are multiplicative over loci, so initially we make 
this assumption also. As with the segregational load, the substitutional 
load relates to the fitness, or offspring requirement, of an individual of the 
most genotype. In this case this is an individual with the superior genotype 
“AiAi” at each locus undergoing substitution. 

At any one time there will be T/n substitutions in progress and thus a to- 
tal of ( NL/T)(T/n ) = NL/n “selective deaths” per generation. From this 
it is found that the offspring requirement of the most fit individual, assum- 
ing the multiplicative model of fitness with and with linkage equilibrium 
always holding between loci, is 

(1 + L/T) T / n ss exp(L/n) « exp(30/n) (2.131) 

if we take the “representative value” 30 for L as discussed above. 

The value n — 300 reached by Haldane (1957), as described above, arises 
from the fact that with this value of n, the expression in (2.131) is about 
1.1, conforming to his view that an “excess reproductive requirement” of 
10% is the maximum that can be expected, at least in mammals. 

Kimura and Ohta (1971a) estimated that in the evolutionary history of 
mammals approximately six substitutions have been completed per gen- 
eration in any evolutionary line. This implies that n = 1/6, 1800 times 
smaller than the Haldane “limiting” value, or equivalently implying substi- 
tutions occurring at 1800 times the upper rate as calculated by Haldane. 
Insertion of the value n = 1/6 in (2.131) leads to a substitutional load of 
e 180 ps 10 78 . This form of calculation was a major factor in the development 
of the neutral theory, since it was argued (Kimura (1968)) that the amount 
of genetic substitution estimated to have taken place in evolution, in partic- 
ular in mammalian evolution, could not be explained by selective processes 
because of a claimed unbearable substitutional genetic load that selective 
substitutions would imply. Thus (Kimura and Ohta, 1971a) claimed that 

“to carry out mutant substitution at the above rate, each parent 
must leave e 180 « 10 78 offspring for only one of the offspring 
to survive. This was the main reason why random fixation of 
selectively neutral mutants was first proposed by one of us as 
the main factor in molecular evolution.” 
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Because of calculations and claims of this type, it is clearly necessary 
to discuss the assumptions, both explicit and implicit, in formal load 
calculations. 

We start with the expression in (2.131), and observe that this expression 
refers not to the offspring requirement of every individual, as is implied 
in the above quotation, but to the requirement of an individual of the 
maximum possible fitness when the population mean fitness is now scaled 
to 1. It is therefore appropriate to focus on this individual and on his fitness. 

Our calculations show that the fitness e 180 is arrived at by assuming 
that fitnesses are multiplicative over loci. This is a quite unreasonable as- 
sumption, and the large offspring requirement of the most fit individual is 
a direct consequence of it. It is certainly true that in nature substantial 
epistasis occurs, and if this is so there will be a considerable reduction to 
the load from that calculated formally by using marginal fitnesses and mul- 
tiplicativity, as discussed below. The unreasonableness of the multiplicative 
assumption was stressed long ago, in particular by Wright (1930). 

The second, and more important, problem concerns the very existence of 
an individual of the optimal multilocus genotype. It is extremely unlikely 
that such an individual ever exists. To simplify the argument we continue 
to consider the multiplicative case discussed above. It can be shown that 
with the individual locus fitness values l + s,l + s/2, 1 for “AiAi, A 1 A 2 and 
A 2 A 2 ”, as is assumed above, and with s = 0.01, n — 1/6, initial frequency 
= 0.0001, final frequency = 0.9999, there will be 22,080 loci substituting at 
any one time. The various favored alleles at each of these 22,080 loci will 
take a variety of frequencies in (0, 1), and in particular at those loci where 
the substitution has only recently started, the frequency of the favored allele 
will be quite low. By calculating the means of the frequencies aq, # 2 , • • • of 
the favored allele at the various loci substituting, using (1.28), it is found 
that the probability that an individual taken at random is of this optimal 
genotype is on the order of io~ 23 ’ 200 . This value is so extremely small that 
a theory basing its numerical computations on the offspring requirement 
of such an individual must demand reconsideration. This point also was 
stressed by Wright (1977, p. 481). 

What is needed is a calculation of the fitness of the individuals who 
might reasonably be expected to occur in the population of interest. Here 
the finite size of any population is an important factor in the calculations. 
Some progress on amending load calculations for this purpose may be made 
by using the statistics of extreme values in a population of given finite size 
(Kimura (1969), Ewens (1970)). It is convenient, for purposes of illustration 
only, to maintain the multiplicativity assumption here so as to discuss the 
point at issue. The starting point is to find the variance of the distribution 
of the fitness of an individual taken at random from the population, if the 
population mean fitness is scaled to unity. In the case considered above 
this variance is s/n (Ewens (1970), Crow and Kimura (1970, p. 252)). For 
s = 0.01, n = 1/6, this is a variance of 0.06, so that the standard deviation 
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in fitness is approximately 0.245. The rather low value for this standard 
derivation arises because it is most unlikely that any individual will have 
a genetic constitution which differs markedly, in terms of the number of 
favored genes carried, from the average. 

If s is extremely small we may suppose, to a first approximation, that the 
distribution of fitness is a normal distribution. The statistical theory of of 
extreme values (see Pearson and Hartley (1958, Table 28)) shows that, for 
example in a population of size 10 5 , the most fit individual that is likely to 
occur will have a fitness approximately four standard deviations in excess of 
the mean. In the present case this implies a fitness of 1+4(0.245) = 1.98. On 
average, then, the most fit individual that is likely to exist in the population 
is required to produce only about two offspring in order to effect the gene 
substitutions observed. This is clearly an easily achievable goal. 

A parallel argument holds for the segregational load as calculated in 
(2.130). The segregational load is clearly the excess over the mean of the 
offspring of the most fit individual, in the segregation load case the mul- 
tiple heterozygote. The probability that an individual chosen at random 
in the population is of this genotype is (l/2) m , and when m is large it is 
extremely unlikely that any individual in a population even of size several 
million has this genotype. As with the substitutional load, it is more rea- 
sonable to consider the fitness of the most fit individual likely to arise in 
the population. This is done as follows. 

The mean fitness of the population is calculated in (2.129). The variance 
in fitness then found as 

E(’”)(5n i +r) 2 ’( i -r) 2l ”^ > - 1 - < 2132 ) 

This expression reduces to 

(1 + is 2 ) m - 1 « e ms2/4 - 1. (2.133) 

For the case m = 10, 000, s — 0.01 this is about 0.28. A fitness four standard 
deviations above the mean is only just in excess of 3, and arguing as above 
for the substitutional load, this clearly is an achievable fitness for the most 
fit individual likely to arise in a population of size 10 5 . 

The essence of the argument, in both the substitutional load and the 
segregational load cases, is that in a finite population only a minute pro- 
portion of all theoretically possible genotypes are realized, and that those 
that are realized are not normally very “extreme” . In particular the fitness 
of the most fit existing genotype is not extreme, and in the substitution 
case, substitutions at the required rate can easily be achieved through each 
individual’s producing as many offspring as this most fit existing genotype, 
with consequent differential viability effecting the required substitutions. 

There are many further arguments that make the substitutional load 
calculations leading to the value e 180 of dubious value. First, it has been 
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assumed in all the calculations that selection arises entirely through via- 
bility differences. To the extent that fertility selection occurs, the offspring 
requirement is correspondingly lowered, in the sense that the calculation 
of the offspring requirement of the most fit individual is not a calculation 
of any relevance to the average individual. 

Second, it has been assumed so far that fitnesses are fixed constants, 
and are not, for example, frequency-dependent. It is possible to devise 
frequency-dependent selection schemes for which there is no segregational 
load at a stable equilibrium. Thus in the fitness scheme 



A\A\ A1A2 

1 + a(l - 2x) 1 



A2A2 

1 - a(l - 2x) 



(2.134) 



where x is the frequency of A\ and a is a small parameter, the point 
x = 0.5 is a point of stable equilibrium, and at this point all genotypes 
have equal fitness and there is no genetic load. On the other hand, it is 
unlikely that frequency-dependent fitnesses can reduce the substitutional 
load to zero, since with a change in gene frequencies due to selection, some 
selective differentials are necessary and hence some load. Little informa- 
tion is available on the extent to which frequency-dependent selection can 
reduce substitutional load. 

We now consider the effects of linkage disequilibrium, and later of epis- 
tasis and linkage disequilibrium jointly, on load calculations. Stationary 
points of an evolutionary system exhibiting linkage disequilibrium gener- 
ally have a higher mean fitness than points where linkage equilibrium holds 
at stationarity, and thus have a lower genetic load than that at linkage equi- 
librium equilibria. This is particularly so when the selective system implies 
epistasis. However, even in the simple multiplicative case, where we can 
say there is no multiplicative epistasis, the stable equilibrium points of 
the evolutionary system can display linkage disequilibrium and thus a de- 
creased segregational load. For example, the calculations of Franklin and 
Lewontin (1970) show that in the case of 36 equally spaced linked loci, a 
multiplicative fitness scheme generalizing the two-locus multiplicative fit- 
ness scheme above with s = 0.1, and with recombination fraction 0.0025 
between adjacent loci, the load when calculated from (2.130) is about 5, but 
when calculated using the actual population mean fitness is about 1.6. The 
smaller load arises from the linkage disequilibrium arising for this model. 
This point has also in effect been made by Lewontin (1974, pp. 289-290) 
in the context in discussing the effect of linkage disequilibrium on mean 
fitness. 

Next, the joint effects of epistasis and linkage disequilibrium can decrease 
the segregational load substantially. Thus, for example, numerical compu- 
tation shows that with the epistatic scheme (2.96) and with R = 0.001, 
there is a stable equilibrium set of gametic frequencies at 



ci = 0.013, c 2 = 0.469, c 3 = 0.503, c 4 = 0.015. 



(2.135) 
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At this point the population mean fitness is 1.0417 and thus the genetic 
load as defined by (2.128) is 0.0233. 

Suppose now that marginal fitness values for this case are found from 
(2.111), and the load calculated according to (2.127) using these marginal 
values and the marginal genotypic frequencies. The loads so calculated are 
0.0212 for the A locus and 0.0210 for the B locus. The sum of these is 
almost twice that of the true load: For R — 0 it would be exactly twice. 
Evidently for general fitness schemes involving tight linkage and epistasis, 
the procedure leading to the load calculation of e 180 , namely the calculation 
of a multilocus segregational load through an amalgamation of single-locus 
segregational load calculations, can lead to serious errors. 

If we take into account, then, the unreasonable multiplicative fitness re- 
quirement implicit in load calculations, the unreasonable concentration on 
the fitness requirement of essentially impossible genotypes, the possibility 
of very substantial linkage disequilibria, the possibility of frequency- 
dependent fitnesses and a variety of other ecological and evolutionary 
arguments concerning the real nature of selective processes, it appears that 
there is no reason for load arguments to imply very conservative bounds 
on the number of loci that can undergo simultaneous selective substitu- 
tion processes, no “load space” argument limiting the number of balanced 
polymorphisms arising at any one time in a population, and no load theory 
support for the neutral theory of evolution. 



2.12 Finite Markov Chains 

Some of the arguments presented later in this book use the theory of finite 
Markov chains, and in this section a brief and informal introduction to the 
theory of these is presented. 

Consider a discrete random variable X which at time points 0, 1, 2, 3, . . . 
takes one or other of the values 0, 1, 2, ... , M. We shall say that X, or the 
system, is in state Ei if X takes the value i. Suppose that at some time £, 
the random variable X is in state Ei. Then if the probability pij that at 
time t - hi, the random variable is in state Ej is independent of t and also 
of the states occupied by X at times £ — 1,2 — 2,..., the variable X is said 
to be Markovian, and its probability laws follow those of a finite Markov 
chain. If the initial probability (at t — 0) that X is in Ei is a* then the 
probability that X is in the state Ei, Ej , Ek , Eg, E m ... at times 0, 1, 2, 
3, 4 ... is apiPijPjkPkiPim 

Complications to Markov chain theory arise if periodicities occur, for 
example, if X can return to Ei only at the time points t\, 2t\, 3£i,... 
for some integer t\ > 1. Further minor complications arise if the states 
Eq, Ei, , Em can be broken down into noncommunicating subsets. To 
avoid unnecessary complications, which never in any event arise in genetical 
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applications, we suppose that no periodicities exist and that, apart from 
the possibility of a small number of absorbing states, (Ei is absorbing if 
Pa = 1), no breakdown into noncommunicating subsets occur. 

It is convenient to collect the pij into a matrix P — { pij }, so that 



/ Poo Poi • * * Pom \ 
Poi Pn Pim 



(2.136) 



\Pmo Pmi * • * PmmJ 



( 2 ) 

The probability p\- that X is in Ej at time t + 2, given it is in Ei at time 
£, is evidently 



Pi? = X PikPk i- 

k 



Since the right-hand side is the (i, j)th element in the matrix P 2 , and if we 
write P^ 2 ) = {Pif}, then 

p (t) =p t (2.137) 



for t = 2. More generally (2.137) is true for any positive integer t. In all 
cases we consider, P l can be written in the spectral form 

P l = ^o r o^o T + • ■ • + (2.138) 



where Ao, Ai, . . . , A m (|Ao| > |Ai| > • • • > \Xm\) are the eigenvalues of P 
and (4> , . . • , ^m) an d (ro , . . . , tm), normalized so that 



M 

l' i T l = Y J ^jn i = l, (2.139) 

3=0 

are the corresponding left and right eigenvectors, respectively. Suppose E 0 
and Em are absorbing states and that no other states are absorbing. Then 
Ao = Ai = 1 and if | A 2 1 > | A 3 1 and i, j = 1, 2, . . . , M — 1, 

Pif = r 2 i£2j>^ + 0 (^ 2 ) (2.140) 

for large t. Thus the leading nonunit eigenvalue A 2 plays an important role 
in determining the rate at which absorption into either Eq and Em occurs. 

Let 7 Tj be the probability that eventually Em (rather than Eq) is entered, 
given initially that X is in Ei. By considering values of X at consecutive 
time points it is seen that the ^ satisfy 

M 

7T 0 =0, -Km = 1- (2.141) 

1=0 

For the genetic model (1.48) (with M = 2N) the solution of (2.141) was 

7 Xi — ifM. The mean times C until absorption into Eq or Em occurs, given 
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that X is in E i) similarly satisfy 

M 

t i = ^2p ij ij + 1, i 0 =t M = o. (2.142) 

j = o 

Starting with X in Ei the members of the set of mean times {Uj} that X 
is in Ej before absorption into either Eq or Em satisfy the equations 

M 

Uj — ^ ^ Pik^kj T j •> toj — i Mj — 0? (2.143) 

k = 0 

where Sij = 1 and i — j and 6^ = 0 otherwise. Further, 



00 M — 1 

Uj = E Pi?’ - E Uj- (2-144) 

n= 0 j = 1 

An expression can also be found for the variance of of the time before 
absorption, given initially X in Ei, namely 

M — 1 

<7- = 2 E Uj tj -ti- (ii) 2 - (2.145) 

3 = 1 

It is possible to derive the general form of the distribution of the time that 
X is in Ej if initially in Ei. Suppose that, starting in Ei , the probability 
that X ever enters Ej is and that once in Ej , the probability that X 
ever returns to Ej is Tj. Then the probability that Ej is occupied exactly 
n times before absorption takes place at Eo or Em is 

1 — otij for n — 0 

a iji r j) n ~ 1 ( 1 — r j) f° r n — 1- 

This is clearly a modified geometric distribution. The mean is thus 

00 

Uj — oLij ( 1 — r j) nr j 

71 = 1 

— a u/(l — r j) (2.147) 

and the variance is 

00 

<4 = a*j( 1 - r 'j ) E n2 ( r j ) n_1 - 4 

n= 1 

= Uj{ 1 - 4 + 2r,/(l - rj)}. (2.148) 

It is possible to find an expression for rj and hence to calculate (2.148) but 
we do not enter into details here. 



(2.146) 
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Consider now only those cases for which Em is the absorbing state 
eventually entered. Writing X t for the value of X at time t, we get 

p*j = Prob{X i+i in Ej \ X t in E^Em eventually entered} 

= Prob{X i+ i in Ej and Em eventually entered | X t in Ei] 

-7- Prob{i?M eventually entered | X t in Ei] 

= PijKj/ni, (i,j = (2.149) 



using^conditional probability arguments and the Markovian nature of X. 
Let P be the matrix derived from P by omitting the first row and first 
column and let 



Ai 



v = 



7T2 



\o 



\ 



° 

TTm/ 



Then if P* = {Pij}> (2.149) shows that 

P* = V~ l PV. 



(2.150) 



(2.151) 



Standard theory shows that the eigenvalues of P* are identical to those of 
P (with one unit eigenvalue omitted) and that if P(r) is any left (right) 
eigenvector of P, then the corresponding left and right eigenvector of P* 
are t*V and V~ 1 r. Further, if P*( n ) is the matrix of conditional n step 
transition probabilities, 



p*(n) _ ^ p*^n _ y-lpny 



so that 

Pij n) =Pi?n j /Tr i , (2.152) 

a conclusion that can be reached directly as with (2.149). If ?*• is the 
conditional mean time spent in Ej, given initially X in Ei, then 

oo 

% = £»«“’ 

n=0 

oo 

= (^>i)E4 n) (2.1,53) 

n = 0 

— tij'Kj / • 

If there is only one absorbing state interest centers solely on properties 
of the time until the state is entered. Taking Eq as the only absorbing 
state and Ei as the initial state, the mean time ti until absorption satisfies 
(2.142) with the single boundary condition to = 0, and the mean number 
of visits to Ej satisfies (2.143) with the single condition toj = 0. 
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If there are no absorbing states P will have a single eigenvalue and all 
other eigenvalues will be strictly less than unity in absolute value. Equation 
(2.138) then shows that 



lim P l = r 0 <, (2.154) 

t — ^ OO 

and since ro is of the form (1, 1 , 1 ,..., 1)', 

lim pf) — £ 0 j for all i. (2.155) 

t— >oo J 

Using a slightly different notation we may summarize this by saying 

lim p\f=<j>j, (2.156) 

t— >00 J 

where <j> = (0o, 0i, . . . , 0 m) is the unique solution of the two equations 

M 

<t>' = 4>'p, = L (2.157) 

j = 0 

The vector ot is called the stationary distribution of the process and in 
genet ical applications exists only if fixation of any allele is impossible (e.g. 
if all alleles mutate at positive rates). 

If the matrix P is a continuant (so that pij = 0 if \i — j\ > 1) explicit 
formulas can be found for most of these quantities. We write Pi,i+i — A i 
and Pi,i ~ i = p>i in conformity with standard notation in this case. If Eq 
and Em are both absorbing states the probability 7r^ in (2.141) becomes, 
explicitly, 



where 



Further 



2—1 M — 1 

~ ^ ^ Pk/ ^ ] Pki 

k= 0 fc=0 



PO = 1 , Pk 



P1P2P3 Pk 
A1A2 • ■ • Afc 



tij 



tij 



(i-^)ELoPfc 

Pj-lPj 

E m — 1 

k=j P k / 



(j = 1, •••,*), 

1 + 1,;...,M-1). 



(2.158) 



(2.159) 



Equations (2.144) and (2.153) then yield U, t*j and t\ immediately. When 
there is only one absorbing state (2.144) still holds, but now Uj is defined 
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by 



i 



k{ 






_ A 7 _ 1 A n — 1 A i — 2 

1-h-^-h J J +■ 
P'j — l — — 2 

0* = 1,2, ... ,2) 



+ 



Aj-iAj-2 • • • A 



— lMy — 2 • • *Mi 



r ( A^A^_|_i * • * Xj-l 

tii l 



{j = i+ 



\/^z+lM2+2 ’ Pj 

if Eq is the absorbing state and by 






Aj Aj-|-i ■ * ■ A^— i 



^-1 



1 q. ^+ 1 _|_ Mi+i/^+i ^z+iMz+2 • • • Mm- 2 

Az+i A^+iA^ A^iA^^'Am- 

(j = i, z + 1, . . . , M — 1) 



(2.160) 






(2.161) 



if Em is the absorbing state. In this case of course there can be no further 
concept of a conditional mean absorption time. 

Finally, when there are no absorbing states, the stationary distribution 
<f) is defined by 

<t>i = 00 — 1 (2.162) 

M1M2 '■■Ail 

where <f > o is chosen so that ^ = 1. 

Various further results are possible for continuant Markov chain models, 
an accessible summary being given in Kemeny and Snell (1960). We shall 
draw on the formulas given above on a number of occasions throughout 
this book. 

We conclude our discussion of finite Markov chains by introducing the 
concept of time reversibility. Consider a Markov chain admitting a sta- 
tionary distribution {0 O , <pi , ...,0m}- Then we define the process to be 
reversible if, at stationarity, 



ProbjXt, W + i, . . . , X t +n} — Prob{W, X t ~u . . . , X t - n } (2.163) 

for every t and n. A necessary and sufficient condition for this is that the 
stationary state has been reached and that the equation 



4*iPij — *PjPji (2.164) 

hold for all i, j . Certain classes of Markov chains are always reversible. 
For example, if the transition matrix is a continuant, (2.162) and (2.163) 
jointly show that the Markov chain at stationarity is reversible. Certain 
other chains, in particular several having genet ical relevance, are reversible: 
we shall consider these later when discussing the uses to which the concept 
of reversibility can be put. 




3 

Discrete Stochastic Models 



3.1 Introduction 

In the last section of the previous chapter some elementary finite Markov 
chain theory was introduced. In this chapter we apply this theory to var- 
ious Markov chain models which arise in genetics. We shall find that the 
complexities of these models are such that not all questions of genet ical 
interest can in practice be answered by using Markov chain theory, and 
in the next two chapters we shall introduce diffusion theory to arrive at 
a more complete, although approximate, description of the properties of 
Markov chain models of interest in genetics. 



3.2 Wright-Fisher Model: Two Alleles 

In Chapter 1 we were led to the Wright-Fisher model (1.48) as a simple 
approximate representation of the stochastic behavior of gene frequencies 
in an idealized finite population. Our first aim is to discuss some of the 
properties of this model in the light of the theory of Section 2.12. We 
have already noted that in the model (1.48), the number X of A\ genes 
is a Markovian random variable with two absorbing states, X — 0 and 
X — 2 N. Further, the probability that eventually X = 2 TV, given that 
initially X = z, is simply i/2N. 

We now ask whether the theory of Section 2.12 gives us further informa- 
tion on the behavior of X before an absorbing state is reached. The most 
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interesting quantities are the mean time U until absorption, given initially 
X = i, and the mean number of times Uj that X takes the value j before 
absorption. While in principle these expressions can be found from (2.142) 
and (2.143), in practice solution of these equations seems extremely diffi- 
cult for this model, and simple expressions for these mean times have not 
yet been found. It is indeed likely that no simple expressions exist for them. 
On the other hand, it is possible to find a simple approximation for ti by 
the following line of argument. 

In (2.142) we put M = 27V, i/M — x, j/M = x + 5x, and ti = t(x). We 
suppose t(x) is a twice differentiable function of a continuous variable x. 
Then (2.142) can be written 

t(x) — ^ Prob{x — > x + 6x}i(x + 5x) + 1 (3.1) 

= E {t(x + &c)} + 1 (3.2) 

~ i(x) + E (6x){t(x)Y + \~E{8x) 2 {i(x)} n + 1, (3.3) 

where all expectations are conditional on x and in (3.3) only the first three 
terms in an infinite Taylor series have been retained. Since from (1.48) 

E (Sx) = 0, E{Sx) 2 = (2N)~ 1 x(l - x), 



(3.3) gives 



x(l — x){t(x)Y f « —AN. (3.4) 

The solution of this equation, subject to the boundary conditions f( 0) = 
f(l) = 0, is 



t{p) w — 4iV{plogp +{l-p) log(l - p)}, (3.5) 

where p = i/2N is the initial frequency of A\. We shall see later that 
this is the so-called diffusion approximation to the mean absorption time, 
although we have here not made any reference to diffusion processes. 

In the case i = 1, so that p = (27V) -1 , the value appropriate if A\ is a 
unique new mutation in an otherwise purely A 2 A 2 population, (3.5) reduces 
to 



t{(2N) x } « 2 + 21og2iV generations, (3.6) 

while when p — | , 

i{^} ~ 2.8N generations. (3.7) 

This very long mean time, for equal initial frequencies, is of course inti- 
mately connected with the fact that the leading nonunit eigenvalue of the 
transition matrix in (1.48) is very close to unity. 

Suppose now the condition is made that A\ eventually fixes. The possi- 
ble values for X are 1, 2, 3, ... , 2N and (2.149) shows that the conditional 
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transition probability p* ■ is 




'27V \ / i V /27V- i\ 2N ~ j j 

j J {2nJ { 2N J i 

27V -l\/ i y -1 /27V - i\ 2N ~ j 
j-1 J \27V ) V 2N J 



(3.8) 



An intuitive explanation for the form of p*j is that under the condition 
that A i fixes, at least one A\ gene must be produced in each generation. 
Then p*. is the probability that the remaining 2 TV — 1 gene transmissions 
produce exactly j — 1 A\ genes. An argument parallel to that leading to 
(3.4) gives 

(1 - x){f Or)}' + t x (l - x){t*(x)}" = -27 V (3.9) 

for the conditional mean time t*(x) to fixation, given a current frequency 
of x . The solution of (3.9), subject to £*( 1) = 0 and the requirement 



lim t*(x) is finite, (3.10) 

x — >o 

and assuming initially x = p, is 

t*(p) - — 47Vp _1 (l -p)log(l -p). (3.11) 

We observe from this that 

P{(27V) -1 } « 4N — 2 generations, (3.12) 

t*{ w 2.8^ generations, (3.13) 

t*{ 1 — (2N)~ 1 } w 2 log 2 A generations. (3-14) 



The approximation (3.13) is to be expected from (3.7), since by symmetry, 
when the initial frequency of A\ is the conditioning should have no effect 
on the mean fixation time. On the other hand, (3.12) and (3.14) provide 
new information, and show that while when the initial frequency of A\ 
is (2iV) -1 it is very unlikely that fixation of A\ will occur, in the small 
fraction of cases when fixation of A\ does occur, an extremely long fixation 
time may be expected. Further conclusions will be given later when we 
consider the diffusion approximation to the Wright-Fisher model (1.48). 

As noted in Chapter 1, the initial analysis of the model (1.48) by Fisher 
and Wright paid particular attention to the leading eigenvalue of the tran- 
sition matrix, regarded as a measure of the rate at which one or other allele 
is lost from the population. Although, as we see below, the eigenvalues are 
of less use than expressions like (3.5) and (3.11) for this purpose, they are 
nevertheless of some interest, so we now write down the formulas for these 
eigenvalues. 

Since the matrix defined by the pij in (1.48) is the transition matrix of a 
Markov chain, it follows that one eigenvalue of the matrix is automatically 
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1. Denoting this eigenvalue by Aq, the remaining eigenvalues, first derived 
by Feller (1951), are 

\j = (2N)(2N-l)---(2N-j + l)/(2NY, j = 1 , 2 , . . . , 2N. ( 3 . 15 ) 

This confirms the values Ai = 1 and A 2 = 1 — (2 N)~ l found earlier by 
other methods. We derive the eigenvalues in (3.15) in Section 3.3 as par- 
ticular cases of an important model of Cannings (1974) which generalizes 
the Wright-Fisher model. 

Although considerable attention has been paid to the leading nonunit 
eigenvalue A 2 and, to a lesser extent, to the complete set (3.15), it is 
possible to argue that these eigenvalues are of limited usefulness. First, 
(2.151) shows that the eigenvalues in the conditional process, where even- 
tual fixation of a specified allele is assumed, are the same as those in the 
unconditional process. On the other hand, the mean fixation time values 
are quite different in the two cases, as (3.5) and (3.11) show, and thus are 
not adequately described by knowledge of the eigenvalues alone. Second, 
we shall show later that at least in the model (1.48), by the time that the 
term defined by the leading nonunit eigenvalue in the spectral expansion 
(2.138) dominates the remaining terms, it is very likely that loss or fixation 
of A\ will already have occurred. 

Suppose now that A\ mutates to A 2 at rate u but that there is no 
mutation from A 2 to A\. It is then reasonable to replace the model (1.48) 
by 



Pij = - *l>i) 2N J (3.16) 

where 3pi = i(l — u)/2N. Here eventual loss of A\ is certain, and interest 
centers on properties of the time until A\ is lost, either using eigenvalues or 
mean time properties. For the moment we consider mean time properties 
and note that an argument parallel to that leading to (3.4) shows that to 
a first approximation, the mean time f(x), given a current frequency x, 
satisfies 



—ANux{t(x)} f + x(l — x){t(x)} /f = —AN. (3.17) 

If initially x = p, the solution of this equation, subject to the requirements 

m = 0, 



is 



lim t(x) is finite, 

x— >-1 




0 



t(x,p) dx generations, 



(3.18) 
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where for 6 ^ 1, 

t(x,p) = 47Vx _1 (l - #) -1 {(l - x) e ~ 1 - 1}, 0 < x < p, 1 

t(x,p) — ANx~ 1 (l - 0)~ 1 ( 1 - x) e ~ l {\ - (1 - p) l ~ 6 }, p < x < 1, J 

(3.19) 

and 6 = ANu. The corresponding formulas for the case 9 — 1 are found 
from (3.19) by standard limiting processes. 

It may be shown (Griffiths, 2003) that with the definition of t(x,p) in 
(3.19), i(p) may be written as 

OO A -my 

P-20) 

The function t(x,p) in (3.19) is more informative than it initially appears 
since, as we see later, t(x,p)5x provides an excellent approximation to the 
mean number of generations for which the frequency of A\ takes a value in 
(x, x + 8x) before reaching zero. 

There are two interesting special cases of (3.20). First, when 0 = 2, 



t(x,p) = 47V, 0 < x < p, 

t(x,p) = ANx~ 1 (l — x){{1 — p)~ l — 1}, p < x < 1, 
and from this, 



(3.21) 



m - (3.22) 

l-p 

a conclusion that can also be found directly from (3.20). Second, when 
p — 1, (3.20) gives immediately 



^ = £777 



41V 



JU - 1 + 0) 



(3.23) 



We shall return to these two cases later, when discussing the expressions 
in (9.102) and (9.95). 

Suppose next that A 2 also mutates to A\ at rate v. It is now reasonable 
to define ^ in (3.16) by 



ipi = {7(1 - u) + (27V - i)v}/2N. (3.24) 

There now exists a stationary distribution <// = (<f) 0 , 0i, . . . ,</> 2 iv) for the 
number of A\ genes, given in principle by (2.157). The exact form of this 
distribution is complex, and we consider later an approximation to it. On 
the other hand, certain properties of this distribution can be extracted 
from (3.16) and (3.24). The stationary distribution satisfies the equation 
(j) f = $ P, where P is defined by (3.16) and (3.24), so that if £ is a vector 
with 7th element 7 (7 = 0, 1, 2, ... , 27V) and p is the mean of the stationary 
distribution, 



m = 4>'S = 
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The ith (i = 0, 1, 2, . . . , 2 N) component of P £ is 

Ej (7) WP 

and from the standard formula for the mean of the binomial distribution, 
this is 2 Nipi or 

i( 1 — u) + (2 N — i)v. 



Thus, 



It follows that 



<//P£ =.^{i(l — u) + (2N — i)v}oii 
— /x(l — u) + v(2 N — fi). 



(i = (1 — u)fi + ^(2^ — /i) 



or 



(i — 2Nv/(u + v). (3.25) 

In view of the deterministic stationary frequency (1.33), this value is not 
surprising. Similar arguments show that the variance a 2 of the stationary 
distribution is 

a 2 = AN 2 uv/{{u + v) 2 (4Nu + ANv + 1)} + smaller order terms. (3.26) 

Further moments can also be found, but we do not pursue the details. 

The above values are sufficient to answer a question of some interest in 
population genetics, namely “what is the probability of two genes drawn 
together at random are of the same allelic type?” If the frequency of A\ is 
x and terms of order N~ l are ignored, this probability is x 2 + (1 — x) 2 . The 
required value is the expected value of this over the stationary distribution, 
namely 

E{x 2 + (1 - x) 2 } = 1 - 2E(x) + 2E(x 2 ). 

If u = v, ANu — 9 , (3.25) and (3.26) together show that this is 

Prob (two genes of same allelic type) « (1 + 9)/( 1 + 29). (3.27) 

This probability can be arrived at in another way, which we now consider 
since it is useful for purposes of generalization. Let the required probability 
be F and note that this is the same in two consecutive stationary genera- 
tions. Two genes drawn at random in any generation will have a common 
parent gene with probability (2 iV) -1 , or different parent genes with proba- 
bility 1 — (2iV) -1 , which will be of the same allelic type with probability F. 
The probability that neither of the genes drawn is a mutant, or that both 
are, is u 2 + (1 — w) 2 , while the probability that precisely one is a mutant is 
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2u(l - u). It follows that 



F = {u 2 + (1 - u) 2 }{— + F(1 - — )} 



Thus exactly 



and approximately 



2N 

+ 2u(l — u)(l — F)(l — 2 ^)- 



1 + 2u(l — u)(2N — 2) 
~ 1 +4u(l -u){2N- 1)’ 

F=(l +*)/(! + 20), 



(3.28) 



in agreement with (3.27). A third approach (see (5.71)) yields the same 
answer. 

Suppose now that selection exists and that the genotypes Mi Mi, Mi M 2 , 
and M 2 M 2 have fitnesses given by (1.25a). In view of (1.24) a reasonable 
stochastic model is found by assuming that the transition matrix for the 
number of Mi individuals is (3.16), where now 

'{pi — U)~ l ({wnX 2 + Wi 2 X(l — x)}(l — u) 

+ {wi 2 x(l -x)+ ^ 22(1 - %) 2 }v), (3.29) 

where x = i/2N and w is defined by (1.38). The qualitative properties of 
this model are clear: When u = v = 0, one or other absorbing state, X = 0, 
X = 2 AT, is eventually reached. When ^>0,^ = 0, Mi is eventually 
lost from the population, and when u,v > 0 there will exist a stationary 
distribution for the number of Mi genes. Essentially no quantitative results 
concerning this behavior are known, and the best that can be done is to 
consider approximations. We do this in Chapter 5 by using diffusion theory, 
and for the moment foreshadow this approach by deriving an approximate 
formula for absorption probabilities when u — v = 0. 

We suppose that wn = 1 + s, w 12 = 1 + sh and wn = 1, where s is of 
order N ~ l . Put a = 2Ns and, in (2.141), write % — 2Nx, j = 2N(x + Sx). 
Then this equation may be written 

tt(x) = ^ Prob(x —)’X + 5x)tt(x + Sx) 

« y^Prob(x x + fe){7r(x) + Sx 7r'(x) + |(5x) 2 7r // (x)} 

= 7r(x) + E(5x)7T f (x) + ^{Sx) 2 ^ 1 {x). 

Under the assumptions we have made, 

E(fe) = (2A^) _1 ax(l - x){x + h( 1 - 2x)} + 0(N ~ 2 ), 

E (Sx) 2 = (2N)~ 1 x(l -x) + 0(N~ 2 ). 

Thus to the order of approximation we use, these calculations give 
2 a{x + h( 1 — 2x)}7r f (x) + 7r ,f (x) — 0. 
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The solution of this equation, subject to the obvious boundary conditions 
7r(0) = 0, 7r(l) = 1, is 



where 



X 1 

^( x ) = J i’(y)dy/ J ii>{y)dy , 



(3.30) 



i>(y) = exp(-ay{2h + y( 1 - 2h)}). 

In the particular case h = for which the heterozygote is intermediate in 
fitness between the two homozygotes, this reduces to 

7r(x) = {1 — exp(— ax)} /{l — exp(— a)}. (3.31) 

It is of some interest to use this approximate formula to get some idea 
of the effect of the selective differences on the probability of fixation of 
A\. Suppose for example that TV = 10 5 , s = 10 -4 , and x = 0.5. Then 
a = 20 and, from (3.31), 7r(0.5) = 0.999955. By contrast, for s = 0 we have 
7r(0.5) = 0.5. Evidently the rather small selective advantage 0.0001, which 
is no doubt too small to be observed in laboratory experiments, is never- 
theless large enough in evolutionary terms to have a significant effect on 
the fixation probability. Clearly this occurs because, while selection might 
have only a minor effect in any generation, the number of generations until 
fixation occurs is so very large that the cumulative effect of selection is 
considerable. We consider this problem at greater length later when more 
general models are considered and when a more powerful theory is available 
to handle them. 



3.3 The Cannings (Exchangeable) Model: Two 
Alleles 

An important generalization of the Wright-Fisher form of model was in- 
troduced by Cannings (1974). We consider a “population” of genes of fixed 

size 27V, reproducing at time points 7 — 0, 1,2,3, The stochastic rule 

determining the population structure at time 7 + 1 is quite general, provided 
that any subset of genes at time 7 has the same distribution of “descendant” 
genes at time 7 + 1 as any other subset of the same size. Thus, if the 7th 
gene leaves yi descendant genes we require only that y\ + • • • + y 2 N = 27V 
and that the distribution of yj , . . . , y^) be independent of 7, j, . . . , k. In 

particular all genes must have the same offspring probability distribution. 
This distribution must have mean 1 and we denote the variance of this 
distribution by a 2 . This interpretation of a 2 is used throughout this book 
when Cannings models are considered. In some Cannings models a gene 
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present at time t can also be present at time t + 1, and is then counted as 
one of its own descendants. An example of this is discussed later. 

The Wright-Fisher model (1.48) is a particular case of the Cannings 
model, since in the model (1.48) (yi , y 2 , . . . , y 2 N ) have a symmetric multino- 
mial distribution. However the Cannings model is more general and realistic 
than the Wright-Fisher model. 

Our first calculation concerning the Cannings model relates to eigenval- 
ues. Let the genes be divided into two allelic classes, A\ and A 2 , and let 
Xt be the number of A\ genes at time t. Then we have 



Theorem 3.1 (Cannings (1974)). If 

Pij = Prob{X t+ i =j\X t = i}, i,j = 0, 1, 2, , 27V, 

then the eigenvalues of the matrix {pij} are 

A 0 - 1, Xj = E(yi y 2 ■••%), j = 1, 2, . . . , 2N. (3.32) 

Since we use this theorem, or generalizations of it, several times below we 
reproduce here a proof of it, following Cannings (1974). 



Proof. Let P = {pij}- Suppose that a nonsingular matrix Z and an upper 
triangular matrix A can be found such that PZ — Z A. Since this equation 
implies P = ZAZ ~ X , the eigenvalues of P are identical to those of A which, 
because of the special nature of A, are its diagonal elements. Consider now 



the nonsingular matrix Z, defined by 






/I 


0 


0 


0 


0 \ 




1 


l 


l 2 


l 3 •• 


]2 N 




1 


2 


2 2 • 


2 3 


2 2N 


Z = 


1 


3 


3 2 


3 3 •• 


■ 3 2W 




V 


2N 


(27V) 2 


(27 V) 3 • • 


- ( 2N) 2N J 



With this definition of Z the (z, j)th element of PZ is 



^ ^ Pik & j 
k 



which can be written 



E[{x(t + i)y \x(t) = i\. 



Similarly the (i,j ) th element of ZA is of the form 



3 

a kji 5 



k=0 



which may be written as 

djji^ + terms in V~ l , /” 2 , . . . 
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Here = i(i — l)(i — 2) • • • (i — j + 1). It follows from this that if we can 
write 

E[{X(£ -f l)p | X(t) — i] — djji ^ + terms in . . . (3.33) 

then the ajj {j — 0, 1, 2, , 2 N) are the eigenvalues of P. In the Cannings 
model, 



E[{X(t + 1)F I X(t) = i] = E{ yi +y 2 + --- + Vi y 

= H + * W E(yi?/2 ■■■Vj), 

and it follows that a representation of the form (3.33) is indeed possible for 
this model, with 

a jj = E(j/ij/2 • • • , Vj), j = 0, 1, 2, ... , 2 N. 

This completes the proof of the theorem. Cannings also asserted that except 
in the trivial case yj = 1, the eigenvalues obey the inequalities 

1 = Ao = Ai > A2 > A3 > • • • > Afc = Afc+i = • • • = A2 n = 0 

for some k. However, Gladstien (1978) demonstrated that this is not quite 
true, and that all that can be asserted is that 



1 — Aq — Ai > A2 > A3 > • ■ • > A*; = Afc+i = • ■ • = A2 jv- 



It was noted above that in the simple Wright-Fisher model (1.48), any 
set yi , t/2 » • • • , Vj has a multinomial distribution with index 2 N and common 
parameter (2A^) _1 . This implies that if we write 



(2AQ! = /n\ 

ViW- ■ ■ ■ Vj\{2N - yi yj)\ \y/’ 

the eigenvalue A j,j = 1 , 2 ,..., 2 N is given by 



GT"( 

= (2N)(2N - 1) ... (27V - j + 1)/(27V)T 




2 N-'Eyi 



(3.34) 



This confirms the values given in (3.15), found originally by other methods. 

Theorem 3.1 shows that for the Cannings model, the leading nonunit 
eigenvalue is A 2 = E ( 1 / 11 / 2 ) where, as defined before Theorem 3.1, yi is 
the number of descendent genes of the ith gene in the population. Now 
= 2iV, so that the variance of (Y^Vj) i s 0- Then by symmetry, 



2 N var(^) + 2N(2N - 1) covar(y^, yj) — 0. 
This implies that 



covar(y,, yj) = -<t 2 /(2N - 1), 



(3.35) 
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where a 2 = var(^). Immediately then, 

A 2 = E(yiy 2 ) 

= covar(yi, y 2 ) + E(y 1 )E(y 2 ) 

= 1 — g 2 / (2N — 1). (3.36) 

To confirm this formula we observe that in the Wright-Fisher model, yi 
has a binomial distribution with index 2iV and parameter (2N)~ l . Thus 

A 2 = 1 - {1 - (2iV) -1 }/ (27V - 1) = 1 - (2 N)~\ 

agreeing with the 11 j = 2” case in the expression in (3.34). 

Other properties of the Cannings model follow easily. For example, it is 
clear by symmetry that the probability of eventual fixation of any allele in 
such a model must be its initial frequency. Further, suppose that there are 
X(t) Ai genes in the Cannings model at time £, and write X(t) = i for 
convenience. If we relabel genes so that the first % genes are Ai, 

vai{X(t + 1) | X(t)} = var (y 1 + b yt) 

= ia 2 + i(i - 1) covar(yi, y 2 ) 

= i(2N — i)a 2 /(2N — 1), (3.37) 

from (3.35). If x(t) = X(t)/2N, it follows that 

var{x(t + 1) | x(t)} = x(t){ 1 — x(t)}cr 2 / (2N — 1). (3.38) 

To find the eigenvalues of the matrices defined by (3.16) and (3.24) we 
use a second theorem due to Cannings (1974). Suppose that if mutation 
does not exist, the conditions for Theorem 3.1 hold. Now assume that A\ 
mutates to A 2 at rate u, with reverse mutation at rate v. Write xi = yi + Zi, 
where yi = 1 or 0 depending on whether or not the ith gene at time t 
continues to exist at time t + 1. Thus, yi — 0 in the model (3.16), but we 
are considering now more general conditions than those specified by this 
equation. The variable Zi is the number of offspring genes from the ith. gene 
at time t. If this gene is of type Ti, define zn as the (random) number of 
its A\ (that is, nonmutated) offspring: zn has a distribution which depends 
on Z{. Similarly if the ith gene is of type A 2 let Zi 2 be the random number 
of its Ai (that is mutant) offspring. Then we have 

Theorem 3.2 (Cannings (1974)). The eigenvalues of the matrix P describing 
the stochastic behavior of the number of A\ genes are 

A 0 = 1, A j = y^Prob(2i,...,Zj) < Ejpj/i + z u ~ z i2 I , z j ) 

l i = 1 

0" — 1) 2, ••• , 2N). (3.39) 

The proof of this theorem is omitted here. In the model defined by (3.16) 
and (3.24), yi = 0 and z\ ... Zj have a multinomial distribution with 
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index 2N and common parameter (2N) 1 . Further, given zn and z& 
have binomial distributions with respective parameters 1 — u and v. Thus 

E(z»i - Zi2 I Zi) — (1 - u - v)Zi 



and 

A j — Prob(zi, . . . , Zj)(l — u — vY z\ • • • Zj 

= (1 — u — v) j E(zi * • • Zj) (3.40) 

= (1 -u- v) j {2N(2N - 1) • • • (27V - j + l)/(2 N) j }, j = 1,2,..., 2N. 

The conclusion of (3.34) has been used in reaching this formula. The leading 
nonunit eigenvalue Ai is 1 — u — v and is thus independent of N . This 
is extremely close to unity and suggests a very slow rate of approach to 
stationarity in this model. The eigenvalues (3.40) apply also in the one-way 
mutation model, for which we simply put v = 0 in (3.40). 

The conditional branching process model is a particular case of the 
Cannings model. In this model it is supposed that each gene produces 
k offspring with probability f k (k = 0, 1, 2, 3, . . .), with the numbers of off- 
spring from different parents being assumed independent. If f(s) = fis\ 
the generating function of the distribution of the total number of offspring 
genes is [/(s)] 2iV . We now make the condition that the total number of 
such offspring is 2 iV. If at time t there were i A\ genes, the probability pij 
that at time t+l there will be j A\ genes is 



coeff P 



a n • 



Pij 






2 N-i 



coeff s 2N in [f(s) 



12 TV 



(3.41) 



Transition probabilities of this form were introduced by Moran and Wat- 
terson (1959), who used them to find explicit expressions for the leading 
nonunit eigenvalue in dioecious populations with various family structures. 
Extensions to this theory were given by Feldman (1966). 

Karlin and McGregor (1965) have analyzed the conditional branching 
process model in detail. They show in particular that the eigenvalues of 
the matrix {pij} are 



Aq — Ai = 1, X k = 



coeff s'- 



2N — k 



in [/(s)] 2 W - fc [/'(s)] 



coeff s 2N in [f{s)] 2N 



fc = 2, 3, . . . , 27V. 

(3.42) 

These must agree with the values found in (3.32), since a conditional 
branching process is a Cannings model. We check that this agreement holds 
for the eigenvalue A 2 . It is clear from (3.41) that 



Y^Pii 13 = 



coeff s 2N in [/(£s)]*[/(s)] 

coeff s 2N in [/(s)] 27V 



|2 N-i 
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Differentiating twice with respect to t and putting t = 1, 

Xh'O - 1 )Pij = A 2 i{i - 1) + mi, (3-43) 

3 

where A 2 is defined by (3.42) and 7/2 is some constant independent of % 
and j. Now YjVij — ^ by symmetry, and YjU ~ l)Pij = c 2 , where a 2 is 
defined after (3.35). Thus putting % — 1 in (3.43) we get 772 = a 2 and then 
putting i — 2, we get 

- l)P2j = 2A 2 + 2 ct 2 , 



^ ' j‘^P‘ 2 j — 2A 2 + 2a 2 + 2. (3.44) 

3 

But the left-hand side in (3.44) is E(yi + t^) 2 , where yi is the random 
number of offspring genes left by parental gene i. It follows that 

2 + 2cr 2 + 2 E(y\y 2 ) = 2 A 2 + 2cr 2 + 2 



or 



A 2 = E(yiy 2 ), 

as required. Parallel calculations can be made for the remaining eigenvalues, 
but we do not pursue the details here. 



3.4 Moran Models: Two Alleles 

The conclusions reached so far depend on the assumption that the appro- 
priate model to describe the stochastic behavior of the number of A\ genes 
is one or other form of the model (3.16). Different conclusions are reached 
for models other than these, and we consider now a model due to Moran 
(1958) for which this is so. Moran’s model has the additional advantage of 
allowing explicit expressions for many quantities of evolutionary interest, 
although, strictly, it applies only for haploid populations. 

Consider then a haploid population in which, at time points t — 1, 2, 
3, . . ., an individual is chosen at random to reproduce. After reproduction 
has occurred, an individual is chosen to die (possibly the reproducing in- 
dividual but not the new offspring individual). This model is an example 
of birth and death models, studied extensively in the stochastic process 
literature. As is discussed later, the model can be generalized by allowing 
mutation and selection, the latter being introduced by weighting the prob- 
ability that an individual of a specific genotype is chosen either to give 
birth or to die. 
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We consider first the simplest case where there is no selection or muta- 
tion. Suppose the population consists of 27V haploid individuals (we use 
this notation to allow direct comparison with the diploid case), each of 
which is either A\ or A 2 . Suppose also that at time 7, the number of A\ 
individuals is i. Then at time t 1 there will be i — 1 A\ individuals if 
an A 2 is chosen to give birth and an Ai individual is chosen to die. The 
probability of this, under our assumptions, is 



= i(2N - i)/{2Nf. 


(3.45) 


Similar reasoning shows that 




Pi,i+i = i(2N - i)/{2N)\ 


(3.46) 


Pi,i = { i 2 + (2N - if}/{2N) 2 . 


(3.47) 



The matrix defined by these transition probabilities is a continuant, so that 
much of the theory of Section 2.12 can be applied to it. In the notation of 
that section, 



Ai jj/i i(27\ 


l-i)/(2Nf, Pi = 1, * = 0, 1, 2, ... , 2TV. 


(3.48) 


It follows that the probability 7r^ of fixation of Ai, given currently i A\ 


individuals, is 


7 h — i/2N, 


(3.49) 


and that using the notation of Section 2.12, 




tij = 


2N(2N-i)/(2N-j), j = 1,2 *, 




tij = 


2Ni/j, j = i + l,...,2N-l. 


(3.50) 


Thus immediately 






k = 2 TV (2 


i 2N-1 

N-i) s r y (2N-j)- 1 +2Ni Y] j~\ 


(3.51) 




j — 1 j=i + 1 




t*j = 2N(2 


N -i)j/{i(2N -j)}, j = 1,2, ... ,i, 




i'j = 21 V, 


i — * + 1) • • • 1 2TV — 1, 


(3.52) 


t* = 2N(2 


i 

TV - i)r 1 yj{2N - j)- 1 + 2TV(2TV — * — 1). 


(3.53) 



3 = 1 



An interesting example of these formulas arises in the case i = 1 , cor- 
responding to a unique A\ mutant in an otherwise purely A 2 population. 
Here i\- — 2N for all j, so that given that the mutant is eventually fixed, 
the number of A\ genes takes, on average, each of the values 1,2,..., 27V — 1 
a total of 27V times. The conditional mean fixation time is given by 

t\ = 27V(27V - 1) 



(3.54) 
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birth and death events. The variance of the conditional absorption time 
can also be written down but we do not do so here. 

The eigenvalues of the matrix (3.30) can be found by using Theorem 3.1. 
Take any collection of j genes and note that the probability that one of 
these is chosen to reproduce is j/2N, with the same probability that one 
is chosen to die. For this model a gene can be (and indeed usually is) one 
of its own “descendants”. Using the notation of Theorem 3.1, the product 
y\V 2 • • • yj can take only three values: 

0 if one of these genes is chosen to die and the gene so chosen is not 
chosen to reproduce, 

2 if one of the genes is chosen to reproduce and none is chosen to die, 

1 otherwise. 

Thus A 0 = 1 and 

Xj = E(yiy 2 ■ ■ ■ Vj) 

= 0{j(2N - l)/(2N) 2 } + 2j(2N - j)/(2N) 2 + 1 - j(AN - j - l)/(2 N) 2 
= l-j(j-l)/(2N) 2 , j = l,2,...,2N. (3.55) 

Various expressions for the corresponding eigenvectors, first found by Wat- 
terson (1961) using Chebychev polynomials, and later by Gladstien (1978) 
using other methods , have been given. We are particularly interested in 
the largest nonunit eigenvalue and its associated eigenvectors. The required 
eigenvalue is 

A 2 = 1 - 2/(2 V) 2 , (3.56) 

and elementary calculations show that the corresponding right eigenvector 
r and left eigenvector i f are given by 

r = (0, 1(2 N - 1), 2(2iV - 2), ... , i(2N - i), . . . , 1(2 TV - 1), 0)' 

t = H(2iV - 1), 1, 1, 1, . . . , 1, -i(2 N - 1)). 

Thus the asymptotic distribution of the number X t of A\ genes for large 
t, given X t ^ 0, 2V, is uniform over the values {1, 2, 3, . . . , (2 N — 1)}. 
The fact that A 2 is very close to unity agrees with the very large mean 
absorption times (3.51) for intermediate values of i. 

If mutation from A\ to A 2 is allowed (at rate u ) , with no reverse muta- 
tion, A\ must eventually become lost, and interest centers on properties of 
the time for this to occur. The model is now amended to 

Pi,i - 1 = {i(2N -i)+ ui 2 }/(2N) 2 — m 
PM +1 = i( 2N - *)(1 - u)/(2N) 2 = A* 

Pi,i — 1 Pz,i— 1 Pi,i-\- 1* 

Equation (2.160) can now be used to find Uj and thus U. We do not present 
explicit expressions since it will be more useful, later, to proceed via ap- 
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proximations. If mutation from A 2 to A\ (at rate v) is also allowed, the 
model becomes 



Pi,i- 1 = {*( 27V - i)(l -v) + ui 2 }/(2N) 2 = m 

Pi,i+i = {*(27V - i)(l - u) + v{2N - i) 2 }/(2N) 2 = A; (3.57) 

Pi,i — I Pi,i— 1 Pi,i+l' 



Here a stationary distribution arises for the number of A\ genes in the 
population, and the typical probability <j>j in this distribution (/> is found, 
from (2.162), to be 



(2N)\T{j + A}T{B - j} 

j!(2JV-j)!r{4}r{B} 



(3.58) 



Here T{-} is the well-known gamma function, A = 2Nv/(l — u — r>), B = 
2N(1 — v)/(l — u — v), C = 2Nu/(l — u — v), D = 2iV/(l — u — v) and 
ao = r{H}r{A + (7}/[r{T)}r{C}]. Although these expressions are exact 
they are rather unwieldy, and we consider below a simple approximation 
to (j) j . 

The Markov chain defined by (3.57), having a stationary distribution and 
a continuant transition matrix, is automatically reversible, as shown by the 
closing remarks in Chapter 2. This is not necessarily true for other geneti- 
cal models: It can be shown, for example, that the Wright-Fisher Markov 
chain defined jointly by (3.16) and (3.24) is not reversible. What does re- 
versibility mean in genetical terms? All the theory we have considered so 
far is prospective , that is, given the current state of a Markov chain, proba- 
bility statements are made about its future behavior. Recent developments 
in population genetics theory often concern the retrospective behavior: The 
present state is observed, and questions are asked about the evolution lead- 
ing to this state. For reversible processes these two aspects have many 
properties in common, and information about the prospective behavior 
normally yields almost immediately useful information about the retro- 
spective behavior. We shall see later how the identity of prospective and 
retrospective probabilities can be used to advantage in discussing various 
evolutionary questions. 

The eigenvalues of (3.57) can be found by applying Theorem 3.2. Here 
pi = 1 unless the zth gene has been chosen to die, in which case yi — 0. 
Similarly zn and Zi 2 are zero unless the zth gene has been chosen to 
reproduce. It is found after some calculation that Aq = 1 and 






j(u + v) 
(27V) 



j(j - - U - v) 

(27V) 2 



i = 1,. . .,27V. 



(3.59) 



These eigenvalues apply also in the case v = 0. The leading nonunit eigen- 
value is 1 — (u + v)/(2N), and since 27V time units in the process we 
consider may be thought to correspond to one generation in the Wright- 
Fisher model, this agrees closely with the value 1 — u — v found in (3.40) 
in that model. 
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We now obtain approximations for several of the above quantities. It is 
evident from (3.51) that 

t(p) « — (2iV) 2 {plogp+ (1 — p)log(l -p)}, (3.60) 

where p = i/2N. The similarity between this formula and (3.5) is interest- 
ing. A factor of 2 A may be allowed in comparing the two to convert from 
birth and death events to generations. There remains a further factor of 2 
to explain, and we show later why this factor exists. 

Consider next the expression (3.58). Put x — j/(2N), u — a/(2N), 
v = (3/(2 N) and let j and 2 N increase indefinitely with x, a and (3 fixed. 
Using the Stirling approximation T{y-\-a}/T{y} ~ y a for large y, moderate 
a, the stationary probability cj>j in (3.58) becomes, approximately, 

h ~ (2 N) ~ 1 -x) a -\ (3.61) 

at least for values of x not extremely close to 0 or 1. Clearly this approxi- 
mation expression is far simpler than the exact value (3.58). The values for 
iij may be calculated from (2.160) and (3.57), and from these the value of 
ii. This is 

p 

ii ~ (2AT) 2 (1 - 9)~ 1 ^ J x -1 {(l - x) e ~ l — 1 }dx 

o 

i 

+ J x~ l (l — x) d ~ l {1 — (\ —p) 1 ~ e }dx'j (3.62) 

v 

birth and death events, where p — i/(2N), x — j/(2N) and 9 is defined for 
the diffusion approximation to this Moran model as 2 Nu. In the particular 
case p = (2 N)~ x this is, to a close approximation, 

p 

U&2N(l+ J x _1 (l -xf-'dx) (3.63) 

( 27 V )- 1 

birth and death events. When 9=1 the form of U may be found by 
application of L’Hospital’s rule. 

Selection can be incorporated in this model by assuming differential birth 
rates or differential death rates. The two approaches give similar results so 
we consider here only the case where death rates differ. To do this we 
suppose that if at any time there are i A\ genes in the population the 
probability that the next individual chosen to die is A\ is 

Mi*/ {Mi* + M 2 (2N — i)}. (3.64) 

If yi = fi 2 there is no selection while if ji\ < (12 the allele A\ has a selective 
advantage over A 2 . It follows that the transition matrix for the number of 
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A\ individuals has elements 

Pi,i- 1 = pii(2N - i)/[2N{nii + // 2 (2iV - i)}], 

Pi,i + i = P2t(2N - i)/[2N{fi\i + p 2 {2N - *)}], ( 3 . 65 ) 

Pi,i = 1 Viyi — 1 Piyi+ 1* 

The matrix defined by (3.65) is a continuant, and the theory of Section 
2.12 applies. In the notation of that section, 

Po = 1, Pk = (Pl/P 2 ) fc , 

and the probability 7Tj of eventual fixation of Ai, given an initial number 
of i A\ individuals, is 

7Ti = {1 - - {pi/p 2 ) 2N }. (3.66) 

If now — 1 — 1 5, where 5 is small and positive, A\ has a slight 

selective advantage over A 2 and (3.66) can be approximated by 

n(x) « {l — exp(-^ax)}/{l - exp(— ^a)}, (3.67) 

where x = i/2N and a = 2Ns. This formula differs from (3.31) by a factor 
of 2 in the exponents. This is not because the selective differences differ 
by a factor of 2, since indeed they do not, but from a more deep-rooted 
difference between the two models which we examine later. 

It is possible to use (3.65) in conjunction with the continuant formulas 
of Section 2.12 to get expressions for mean absorption times, conditional 
mean absorption times, and so on. We do not do this here since the formulas 
become very unwieldy and uninformative, and since also we later consider 
simple approximations for these quantities. It may finally be remarked that 
no formula is known for the eigenvalues of the matrix defined by (3.65). 



3.5 K- Allele Wright-Fisher Models 

The models considered so far can easily be extended to allow K different 
alleles at the locus in question, where K is an arbitrary positive integer. 
In this case the population configuration at any time can be described by 
a vector (Xi, X 2 , . . . , Xk), where X\ is the number of genes of allelic type 
A{. If we assume, as is usual, that X\ + X 2 + • ■ • + Xk = 2 AT, only K — 1 
elements in the above vector are independent. It is however convenient to 
retain all elements in the vector. The most interesting cases of these models 
arise when there is no mutation and a generalization of the Cannings model 
determines the evolution of the population. In this case any allele A{ can 
be treated on its own, all other alleles being classed simply as non-M*, and 
much of the theory of the preceding sections can be applied. One problem 
for which the preceding theory is inadequate is to find the mean time until 
loss of the first allele lost, the mean time until loss of the second allele lost, 




110 3. Discrete Stochastic Models 



and so on. This more complex problem and various associated problems is 
discussed in Section 5.10. 

We consider in detail only the K-allele generalization of the model 
Wright-Fisher (1.48), namely 



Prob{Yj genes of allele A{ at time t + 1 | W genes of allele 
i at time t , i = 1, 2, . . . , K} 



(2 N)\ 

Y1U2! •••>*! 






(3.68) 



where ^ = Xi/(2N). In this case the model (3.68) is in effect a Cannings 
model and the theory for the Cannings model given above, or straight- 
forward generalizations of it, can be used. The eigenvalues of the matrix 
defined by (3.68) are precisely the values in (3.34), where now A j has mul- 
tiplicity (K + j — 2 )\/{(K — 2)!/j!}, (j = 2 , 3, . . . , 2N). The eigenvalue 
Aq = 1 has total multiplicity K. These eigenvalues have the interesting 
interpretation (Littler (1975)) that 



Probjat least j allelic types remain present at time t} const A*. (3.69) 



Expressions for the mean times between losses of alleles are given explicitly 
later (see (5.122) and (5.123)), where it will be shown that the eigenvalue 
expression (3.69) does not give useful information about these mean times. 

When mutation exists between all alleles there will exist a multi- 
dimensional stationary distribution of allelic numbers. The means, vari- 
ances and covariances in this distribution can be found by procedures 
analogous to those leading to (3.25) and (3.26). We consider in detail only 
the case where mutation is symmetric: In this case the probability that any 
gene mutates is assumed to be u, and given that a gene of allelic type Ai has 
mutated, the probability that the new mutant is of type Aj is ( K — 1) _1 , 
(j 7^ i ) . By symmetry, the mean number of genes of allelic type Ai alleles in 
the stationary distribution must be 2N/K. However, it sometimes occurs 
that this is not a likely value for the actual number of genes of any allelic 
type to arise, and we see this best by finding the probability F that two 
genes taken at random from the population are of the same allelic type. 
Generalizing the argument that led to (3.28) we find, ignoring terms of 
order w 2 , that 



F=((2N)- 1 + {l-{2N)- 1 }F)(l-2u) + (l-{2N)~ 1 )(l-F)(2u/(K-l)). 



If we write 9 = 4Nu , this gives 



F « (K - 1 + 6)/(K - 1 + K0). (3.70) 

This expression agrees with that in (3.28) for K — 2, and letting K 00 
we find 



-1 



F«(l + 0) 



(3.71) 
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This formula demonstrates a theme that will recur later. If 8 is small then 
F ~ 1. This implies that it is very likely that one or other allele appears 
with high frequency in the population, with the remaining alleles having 
negligible frequency, despite the fact that all alleles are selectively equiva- 
lent. The imbalance arises because of stochastic effects, and is quite different 
from that predicted by considering the mean allele frequencies only. 

The eigenvalues of the matrix defined by the symmetric mutation model 
are the values (3.34) if A^ is multiplied by {1 — uK(K — l) _1 }h The 
multiplicity of A^ is (i + K - 2 )\/{i\(K — 1)!}. 

In view of the comments concerning the Cannings model made in Section 
3.7 it is plausible that (3.70) and (3.71) hold with 9 defined by 9 = ANu/a 2 . 
There is also a TCallele Moran model which allows various exact formulas, 
but for this model interest centers more on the infinitely many alleles case, 
to which we now turn. 



3.6 Infinitely Many Alleles Models 

3.6.1 Introduction 

In this section we consider three so-called “infinitely many alleles” mod- 
els, namely the Wright-Fisher model, the Cannings and the Moran model. 
The discussion of the Wright-Fisher model is more extensive than that 
for the remaining models. This is not because it is more important than 
the other two: Indeed, the Wright-Fisher model is a particular case of the 
more general, and more plausible, Cannings model. The extensive discus- 
sion of the Wright-Fisher model arises for two reasons. The first of these 
is that calculations for this model are comparatively straightforward, and 
the second is that results for this model can be taken over almost directly 
for the Cannings model, with an appropriate change in the definition of the 
parameter 9 arising in many of the formulas found. 

Results for the Wright-Fisher and the Cannings infinitely many alleles 
models are usually diffusion approximations. By contrast, the infinitely 
many alleles Moran model allows many exact calculations. 

In Chapter 9 we discuss why infinitely many alleles models are of interest 
and will develop some of their properties at greater length length than is 
done in this section. 

3.6.2 The Wright-Fisher Infinitely Many Alleles Model 

The Wright-Fisher infinitely many alleles model follows the generic bi- 
nomial sampling characteristic of all Wright-Fisher models. Mutation is 
intrinsic to the model, but the nature of the new mutants is different from 
anything assumed so far, the key difference being that all mutant genes are 
assumed to be of a new allelic type, not currently or previously seen in the 
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population. This implies that if the mutation rate is u, and if in generation 
t there are Xi genes of allelic type Ai (i = 1, 2, 3, . . .), then the probability 
that in generation t + 1 there will be T* genes of allelic type Ai, together 
with Tq new mutant genes, all of different novel allelic types, is 

Prob{F 0 , Yi,Y 2 ,... | Xi,X 2 ,...} = ® IItt* , (3.72) 

where no = u and 7r i = Xi( 1 — u)/(2N), i = 1, 2, 3, 

This model differs fundamentally from previous mutation models (which 
allow reverse mutation) in that since each allele will sooner or later be 
lost from the population, there can exist no nontrivial stationary distri- 
bution for the frequency of any allele. Nevertheless we are interested in 
stationary behavior, and it is thus important to consider what concepts of 
stationarity exist for this model. To do this we consider delabeled config- 
urations of the form {a, b, c, . . .}, where such a configuration implies that 
there exist a genes of one allelic type, b genes of another allelic type, and 
so on. The specific allelic types involved are not of interest. The possible 
configurations can be written down as {27V}, {2N — 1,1}, {2 TV — 2,2}, 
{2 N — 2, 1, 1}, ... , {1, 1, 1, ... 1} in dictionary order: The number of such 
configurations is p(2N), the number of partitions of 2 N into positive inte- 
gers. For small values of N values of p(2N) are given by Abramowitz and 
Stegun (1965, Table 24.5), who provide also asymptotic values for large N. 
It is clear that (3.72) implies certain transition probabilities from one con- 
figuration to another. Although these probabilities are extremely complex 
and the Markov chain of configurations has an extremely large number of 
states, nevertheless standard theory shows that there exists a stationary 
distribution of configurations, some of the characteristics of which we now 
explore. 

We consider first the probability that two genes drawn at random are 
of the same allelic type. For this to occur neither gene can be a mutant 
and, further, both must be descended from the same parent gene (proba- 
bility (27V) -1 ) or different parent genes which were of the same allelic type. 
Writing for the desired probability in generation t, we get 

F 2 (t+1) = (1 - u) 2 ((2AT)" 1 + {1 - (2AT)- 1 }F 2 (t) ). (3.73) 

At equilibrium, F 2 f+1 ^ = = F 2 and thus 

F 2 = (1 - 21V + 2JV(1 - -ur 2 }" 1 ~ (1 + A) -1 , (3.74) 

where, as as is standard for Wright-Fisher models, 0 = 4 Nu. This is iden- 
tical to the limiting (K oo) value in (3.71), a fact that we return to 
later. 

Consider next the probability that three genes drawn at random 

in generation t - hi are of the same allelic type. These three genes will 
all be descendants of the same gene in generation t, (probability (27V) -2 ), 
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of two genes (probability 3(2iV — 1)((2A^) -2 )) or of three different genes 
(probability (2 N - 1)(2 N — 2)((2 N)~ 2 )). Further, none of the genes can be 
a mutant, and it follows that 

F 3 (t+1) = (1 - u) 3 (2iV) -2 (l +3(27 V - IJfP + (2n - l)(2N — 2)F^). (3.75) 

At equilibrium F.[ t+1 ^ = — F 3 , and rearrangement in (3.75) yields 

F 3 « 2(2 + 0)~ l F 2 « 2!/[(l + 9)( 2 + 0)]. (3.76) 

Continuing in this way we find 

F (t+i) = ^ _ u y^2N - 1)(2AT - 2) • • • (2N - i + 1)(2 N^F^ 

+ terms in Fj l _\ F^} (3.77) 

and that for small values of i, 

Fi& (i — 1)!/[(1 + 9 ){ 2 + 0) * ■ • (i - 1 + 0)]. (3.78) 

We can also interpret Fi as the probability that a sample of i genes contains 
only one allelic type, or, in other words, that the sample configuration is 
{i}. This conclusion may be used to find the probability of the sample 
configuration {i — 1,1}. The probability that in a sample of i genes, the 
first i — 1 genes are of one allelic type while the last gene is of a new allele 
type is Fi-i — Fi. The probability we require is, for % > 3, just i times this, 
or 

Prob{z -1,1} = i{Fi - 1 - F t } « i(i - 2)W/[(1 + 0 )( 2 + 6) • • • (i - 1 + 9)). 

(3.79) 

For i — 2 the required probability is 

Probjl, 1} « 9/(1 + 9). (3.80) 

The probabilities of other configurations can built up in a similar way. 
We illustrate this by considering the probability F^^ that, of four genes 
drawn at random in generation t + 1, two are of one allelic type and two of 
another. Clearly none of the genes can be a mutant, and furthermore they 
will be descended from four different parent genes of configuration {2, 2}, 
from three different parent genes of configuration {2,1}, the singleton being 
transmitted twice, or from two different parent genes, both transmitted 
twice. Considering the probabilities of the various events, we find 

F 2,2 +1) = (1 - u) 4 {2N)~ 3 ({2N - 1)(2 N - 2)(2 N - 3)F§1 

+ 2{2N - l)(2N - 2 )F%1 + 3(2 N - 1 )F 1 ( \ ) ). (3.81) 

Retaining only higher-order terms and letting t — > oo, we obtain 

F 2 ,2 * (3 + d)~ 1 F 2 , 1 = 30/ ((1 + 0)(2 + 0)(3 + 9)). (3.82) 

Continuing in this way we find (Ewens (1972), Karlin and McGregor 
(1972)) an approximating partition probability formula for a sample of 
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n of genes, where is is assumed that n << N. This formula can be pre- 
sented in various ways. Perhaps the most useful formula arises if we define 
A = (Ai, A. 2 , . . . , A n ) as the vector of the (random) numbers of allelic types 
each of which is represented by exactly j genes in the sample. With this 
definition, 



Prob(A = a) 



n\ 9^ aj 

l ai 2° 2 • • • n an ail ( 12 ! • • ■ a n \ S n (9) 



(3.83) 



Here a = (ai,a 2 , . . . , a n ) and S n (0) is defined as 0(0+1) (0+2) ••• (9+n— 1). 

It is necessary that ^2jAj = Yhj a j — n i an d it is convenient to denote 
^Aj, the (random) number of different allelic types seen in the sample, 
by A, and JT aj , the corresponding observed number in a given sample, 
by k. By suitable summation in (3.83) the probability distribution of the 
random variable K may be found as 



Prob (K — k) — \S k \0 k /S n (9), (3.84) 

where |S^| is the coefficient of 0 k in S n (0). Thus | S k | is the absolute value 
of a Stirling number of the first kind (see Abramowitz and Stegun, (1965)). 
From (3.84), the mean of K is 






+ 



0 + 2 



+ ■•■ + 



e 



d+n — 1 ’ 



(3.85) 



the variance of K is 



n— 1 

VSLl(K) = 9 

3 = 1 



3 

(0 + j) 2 ' 



(3.86) 



and the probability that K — 1 is 

{n- 1)? 

(9 + 1)(0 + 2) - - • (9 n — 1) 



(3.87) 



A formula equivalent to (3.83) is the following. Suppose that in the sam- 
ple we observe k different allelic types. We label these in some arbitrary 
order as types 1, 2, . . . , k. Then the probability that K = fc and also that 
with the types labelling in the manner chosen, there are ni, n 2 , . . n k 
genes respectively observed in the sample of these various types, is 



n\9 k 

k\n\ri2 • * • rikS n (9) 



(3.88) 



These various formulas lead to interesting questions of inference, which we 
take up in detail in Sections 9.5 and 11.2. 

Equation (3.73) can be rewritten in the form 

F 2 (t+1) - F 2 (oo) = (1 - u) 2 { 1 - (2AT)- 1 }{F 2 (t) - F 2 (oo) }> (3.89) 

and this implies that (1 - u) 2 { 1 - (2A) -1 } is an eigenvalue of the Markov 
chain configuration process discussed above. A similar argument using 
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(3.75) shows that a second eigenvalue is (1 -u) 3 {1-(21V) -1 }{1 — 2(21V) -1 }. 
Equations (3.77) and (3.81) suggest that (1 - u) 4 { 1 - (21V) -1 }{1 - 
2(21V) -1 }{1 — 3(21V) -1 } is an eigenvalue of multiplicity 2. It is found more 
generally that 

A< = (1 - uY{ 1 - ^AO^Hl - 2(2 N)- 1 } ... {1 - (z - 1)(27V)- 1 } (3.90) 

is an eigenvalue of the configuration process matrix and that its multiplicity 
is p(i) — p(i — 1), where p(i) is the partition number given above. This 
provides a complete listing of all the eigenvalues. For details see Ewens and 
Kirby (1975). 

We consider next the mean number of alleles existing in the population 
at any time. Any specific allele A m will be introduced into the population 
with frequency (21V) -1 , and after a random number of generations will leave 
it, never to return. The frequency of A m is a Markovian random variable 
with transition matrix in (3.16), with ^ defined immediately below (3.16). 
There will exist a mean time that E(T) that remains in the population. 
The mean number of new alleles to be formed each generation is 2 Nu, and 
the mean number to be lost each generation through mutation and random 
drift is E(K)/E(T), where E (K) is the mean number of alleles existing in 
each generation. It follows, by balancing the number of alleles gained each 
generation with the number lost, that at stationarity, 

E(K) = 2NuE(T). (3.91) 

An approximation to E(T) is found by putting p = (21V) -1 in (3.19). This 
gives, to a close approximation, 



E(K) « 9 4- J 6x l {l — x) e 1 dx. (3.92) 

( 2 N )~ 1 

A more detailed approximation is possible. If E(K(xi,X 2 )) is the mean 
number of alleles present in the population with frequency in any interval 
(x\,X 2 ) ((21V) -1 < x\ < X 2 < 1), then 

E(K(xi, X 2 )) ~ J 0x~ l (l — x) e ~ l dx. (3.93) 

Xl 

This equation can be used to confirm (3.85). An allele whose population 
frequency is x is observed in a sample of size n with probability 1 — (1 — x) n . 
From this and (3.93) it follows that the mean number of different alleles 
observed in a sample of size n is approximately 

1 

J { 1 - (1 - x) n }9x~ 1 ( 1 - x) e ~ l dx , 

0 



(3.94) 
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and the value of this expression is equal to that given in (3.85). The function 

<p(x) = 0x~ 1 ( 1 - x) G ~ l (3.95) 

is called the “frequency spectrum” of the process considered. Ignoring 
small-order terms, it has the (equivalent) interpretations that the mean 
number of alleles in the population whose frequency is in (x,x + fe), and 
also the probability that there exists an allele in the population whose 
frequency is in this range, is, for small Sx, equal to 6x~ 1 ( 1 — x^^dx. 

The frequency spectrum can be used to arrive at further results reached 
more laboriously by discrete distribution methods. Thus, for example, 

Prob{only one allele observed in a sample of n genes} 
i 

0 

= (n - 1)!/((1 + 9 ){ 2 + 0) • • • (n - 1 + 6)) 

and this agrees with the expression in (3.78) with the notational change 
of n to i. More complex formulas such as (3.83) can be re-derived using 
multivariate frequency spectra, but we do not pursue the details. 

The form of the frequency spectrum also shows that when 6 is small, the 
most likely situation to arise at any time is that where one allele has a high 
frequency and the remaining alleles are all at a low frequency. This occurs 
for two reasons. The first of these is historical: Different alleles enter the 
population an different times, and an “older” allele has had more time to 
reach a high frequency than a “younger” allele. Second, imbalances in allelic 
frequencies arise through stochastic fluctuations, as in the K-allele model 
as discussed below (3.71). This imbalance agrees qualitatively with that 
found in the A-allele model of Section 3.5. We shall later find a number of 
uses for frequency spectra, all arising through their definitions in equations 
of the form (3.93). 

Although the theory is by no means clear, it is plausible that to a first 
approximation, all the results given in this section continue to apply in 
more complicated Wright-Fisher models, involving perhaps two sexes or 
geographical structure, if the parameter 9 is defined as 

9 = 4 N e u, (3.96) 

where N e is one or other version of the effective population size (see Section 
3.7). 

Various generalizations of the selectively neutral Wright-Fisher infinitely 
many alleles model are possible. One generalization supposes that all het- 
erozygotes have the same fitness 1 + s ( s > 0) and all homozygotes have 
fitness 1. An extreme example arises for self-sterility alleles where homozy- 
gotes cannot appear. For this case we put s = oo. For selective models 
the simple symmetry arguments, which lead to (3.91) no longer apply, and 
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a more complex analysis is necessary. We consider this analysis further 
in Chapter 5. A second generalization supposes that alleles fall into two 
classes, with individuals having two “favored” alleles having fitness 1 + 2s, 
that those having only one favored allele having fitness 1 + s, and that 
those with no favored allele having fitness 1. This model also is considered 
in more detail in Chapter 5. 

3.6.3 The Cannings Infinitely Many Alleles Model 

The reproductive mechanism in the nonoverlapping generations Cannings 
infinitely many alleles model follows that of the general principles of the 
Cannings two-allele model of Section 3.3. That is, the model allows any 
reproductive scheme consistent with the exchangeability and symmetry 
properties of the two-allele model. The mean number of offspring genes 
from any “parental” gene is 1, and the variance of the number of offspring 
genes is cr 2 , necessarily the same for each parental gene. The model follows 
the mutation mechanism of the Wright-Fisher infinitely many alleles model 
described above, in that all mutant offspring genes are assumed to be of 
novel allelic types. 

Many of the results of the Wright-Fisher infinitely many alleles model 
apply for the Cannings model, at least to a close approximation, provided 
that the parameter 0, arising in many formulas in Section 3.6.2, is replaced 
by 0/cr 2 , as justified by the discussion leading to (3.111) below. We therefore 
use these Wright-Fisher formulae, with this change of definition, to apply 
for the Cannings model. 

3.6.4 The Moran Infinitely Many Alleles Model 

The Moran infinitely many alleles model is the natural extension to the 
infinitely many alleles case of the Moran two alleles model considered in 
Section 3.4. Haploid individuals, which we may identify with genes, are cre- 
ated and lost through a birth and death process, as in the two- alleles case, 
but in the infinitely many alleles model it is assumed that an offspring gene 
is a mutant with probability u and that any new mutant is of an entirely 
novel allelic type, not currently or previously existing in the population. 

The stochastic behavior of the frequency of any allelic type in the pop- 
ulation is then governed by (3.57), implying that there can be no concept 
of stationarity of the frequency of any nominated allelic type. On the other 
hand, as with the Wright-Fisher and Cannings models, there will exist a 
concept of the stationary distribution of allelic configurations. The possi- 
ble configurations of the process are the same as those for those models, 
but for the Moran model an exact population probability can be given for 
each configuration. Suppose that /3j ( j = 1,2, ...,2iV) is the number of 
allelic types with exactly j representative genes in the population, so that 
= 2 N. The quantity f3j is the population analogue of the sample 
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number aj in (3.83). The exact stationary distribution of the population 
configuration process is (Trajstman, (1974)) 

Prob(A, 02, 1 / 3 1 2 /?2 . . . (2AT)/?2« fcifol . . . 0 2N \ s 2N (e) ' 

(3.97) 

Here Sj(-) is defined below (3.83) and 9 is defined for this model by 

0 = 2Nu/{l-u). (3.98) 



This is a different definition of often 9 than that applying for the Wright- 
Fisher model, the difference arising because of the effective population size 
applying for the Moran model. 

The expression (3.97) is of exactly the same form as (3.83), with n re- 
placed by 2N and aj by f3j. Thus several of the calculations arising from 
(3.83) are exact for the Moran population process. For example, the dis- 
tribution of the number K 2 N of allelic types in the population is given 
exactly by (3.84), with n replaced by 21V . Thus, immediately from (3.84), 
the monomorphism probability that K^n — 1 is, exactly, 



(21V- 1)! 

(i + 0)(2 + 0)---(2i\r-i + 0) 



(3.99) 



The mean of K 2 N is given by by (3.85), with in both cases n replaced 
by 21V and 9 defined by (3.98), and the variance K 2 N is 



27V- 1 

™(K 2N ) = 6 £ (3.100) 

A further exact result for the Moran model concerns its exact frequency 
spectrum, for which (3.95) gives the diffusion approximation in the Wright- 
Fisher model. 

To find this we consider first the “two-allele” model (3.57). In the in- 
finitely many alleles case we think of A\ as a new arisen allele formed by 
mutation and A 2 as all other alleles. (2.160) can be used to find the mean 
number fi(T) of birth and death events before its certain loss from the 
population. This is 

In the case 9 = 2, this is about 21Vlog(21V) birth and death events, or 
about log(2 N) “generations”. The corresponding approximation for the 
Wright-Fisher model, found from (3.19), is also log (2 N) generations, but 
this formal equality is misleading because of the different definitions of 9 
in the two cases. 

The expression (3.101) has the further interpretation that its typical term 
is the mean number of birth and death events for which there are exactly j 
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copies of the allele in question before its loss from the population. The form 
of ergodic argument that led to (3.92) shows that at stationarity, the mean 
of the number K 2 n of different allelic types represented in the population 
is ufi(T), which is 






3 = 1 



2 N'j (2N + 0 - 1 



(3.102) 



where here and throughout we use the standard gamma function definition 

/ M\ _ T(M+ 1) _ M(M — !)••■ (M — m + 1) 

\m/ m!r(M — m+1) m! 

for non-integer M. The expression (3.102) simplifies to 

e e e e 

6 + 6 + l + 6 + 2 + '" + 0 + 2N-l' 

This is identical to the expression given in (3.85), with n replaced by 2 N, 
as we know it must be. However the expression (3.102) provides the further 
information that the typical j th term gives the stationary mean number of 
alleles arising with j representing genes in the population at any time. In 
other words, the exact frequency spectrum for the Moran model is 



or 1 



'2A^j ^2N + d - 1 



j = l,2,...,2iV. 



(3.103) 



A standard asymptotic formula for the gamma function for large N 
shows the parallel between this exact expression with the diffusion theory 
frequency spectrum (3.95). 

Many further exact results for the Moran model are available. Many of 
these relate to “time” and “age” properties, and will be discussed at length 
in Chapter 9, where “time” and “age” questions are of central interest. 



3.7 The Effective Population Size 

While the Wright-Fisher model (1.48) is less plausible than several other 
available models as a description of biological reality, it has, perhaps for 
historical reasons, assumed a central place in population genetics theory. 
We have already noted three properties of this model: 

(i) its maximum nonunit eigenvalue = 1 — (2iV) -1 , 

(ii) the probability that two genes taken at random are descendants of 
the same parent gene = (2iV) -1 , 

(iii) var {x(t -f 1) | x(t)} = x(t){ 1 — x(t)}/(2N), where x(t) is the fraction 
of A i genes in generation t. 
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In view of these properties it is perhaps natural, if the Wright-Fisher 
model (1.48) is to be used as a standard, to define the effective population 
size in diploid models that are more complicated and realistic then (1.48) 
in the following way: 

= eigenvalue effective population size = |(1 — A max ) 

— inbreeding effective population size = (27T2) -1 , 

N (V) 

= variance effective population size 
= x(t){l-x(t) } 

2 var {x(t + 1) | x(t)} 

Here A max is the largest nonunit eigenvalue of the transition matrix of the 
model considered and 7T2 is the probability, in this model, that two genes 
taken at random in any generation are descendants of the same parent 
gene. Similarly, var{x(£ +1)} is the conditional variance of the frequency 
of A\ in generation t + i in the more complicated model, given the value 
of this frequency in generation t. 

A fourth concept of effective population size, namely the mutation ef- 
fective size, is also possible (Ewens (1989)) but we do not consider this 
concept here. 

Our aim is to compute the three effective population sizes defined above 
for two classes of models that generalize the simple Wright-Fisher model 
(1.48). The first class is the Cannings model considered in Section 3.3 and 
the second comprises Wright-Fisher models that incorporate complicating 
features such as two sexes, geographical subdivision, fluctuating population 
sizes, and so on. 

We consider first the Cannings model, and limit attention for the moment 
to those versions of the model where generations do not overlap. Equations 
(3.36) and (3.104) show immediately that for these models, the eigenvalue 
effective population size N ^ is given by 

N& = (N-±)/a 2 , (3.107) 

where, as in Section 3.3, a 2 is the variance in the number of offspring genes 
from any given gene. Equations (3.38) and (3.106) show that the variance 
effective population size N is given by 

ArW = (W- \)/(j 2 . (3.108) 

A value for can be found in the following way. Suppose that the ith 
gene in generation t leaves offspring genes in generation t + 1, (]T) = 

2N ). Then the probability, given mi, . . . , ra 2 jv, that two genes drawn at 
random in generation t + 1 are descendants of the same gene is 

2 N 

Y,m i (m i -l)/{2N(2N-l)}. 

2 — 1 



( 3 . 104 ) 

( 3 . 105 ) 

( 3 . 106 ) 



( 3 . 109 ) 
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The probability 7r2 in (3.105) is the expectation of this quantity. Now 
has mean unity and variance <r 2 , so that on taking expectations, 7T2 = 
a 2 /(2N — 1). From this, 

AT« = (N- \)/a 2 . (3.110) 

It follows from these various equations that for the Cannings model, all 
three effective population sizes are equal. 

One application of this conclusion is the following. If leading terms only 
are retained, all three definitions of the effective population size in the Can- 
nings model are N/a 2 . From the remarks surrounding (3.96), it is plausible 
that the various Wright-Fisher infinitely many alleles model results given 
in Section 3.6 apply for the nonoverlapping generation Cannings model if 
6 is defined wherever it occurs by 4N e u. That is, to a close approximation, 
we define 0 for the Cannings model by 

6 = 4Nu/cr 2 ' (3.111) 

This definition is used for the calculations in Chapters 9 and 10, where the 
Cannings model plays an important role. 

The above definitions of the effective population size are not appropriate 
for models such as (3.30) where generations overlap. If we write N e for 
any one of the effective population sizes defined inoften (3.104)-(3.106), 
it seems reasonable for such models to define the effective population size 
as N e k/(2N ), where k is the number of individuals to die each time unit. 
Since k = 2 N for models where generations do not overlap, this leaves 
(3.104)-(3.106) unchanged for such models. For the Moran model (3.30), 
where k = 1, this convention yields 

N (e) = N (i) = N (v) = 1 ^ ( 3 . 112 ) 

However, in contrast to our approach for the Cannings model, we do not 
use this observation to use Wright-Fisher diffusion approximation results 
from Section 3.6 for the Moran infinitely many alleles model, since exact 
calculations are available for that model, as described in Section 9.3. Our 
interest in (3.112) arises for another reason, namely that it shows that the 
effective population size in the Moran model is half that in the Wright- 
Fisher model. We now discuss the reason for this. 

Arguments parallel to those leading to (3.5) show that if two alleles A\ 
and A 2 are allowed in the population, the mean time until fixation of one 
or other allele in the Cannings model is 

t(p) « -(4 N - 2){plogp+ (1 -p)log(l -p)}/a 2 , (3.113) 

where p is the initial frequency of A\ and a 2 is defined above. This formula 
explains the factor of 2 discussed after equation (3.60). In the Wright- 
Fisher model a 2 « 1 while in the Moran model cr 2 « 2/(27V). Setting aside 
the factor 2 N as explained by the conversion from generations to birth 
and death events, it is clear that the crucial factor is the difference in the 
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variance in offspring distribution. It is also this factor which leads to the 
difference between (3.31) and (3.67) and that between other similar pairs 
of formulas. 

So far we have ignored the diploid nature of most organisms of interest, 
and we now consider a definition of effective population size for the diploid 
case. We do this here for a Cannings model. An inbreeding effective pop- 
ulation number is sometimes defined where attention is focussed on the 
diploid nature of the organisms in the population. This number will be 
denoted , and is defined as the reciprocal of the probability that two 
genes taken at random in generation t - 1-1 are descended from the same 
individual in generation t. This is tantamount, in the Cannings model, to 
selecting two genes at random in generation t and asking whether the two 
genes drawn at random in generation t + 1 are both descended from one 
or other or both of these. In the notation of (3.109) the probability of this 
event can be written as the expected value of 
N 

— 1)/ {2N(2N — 1)}. (3.114) 

i = 1 



It is not hard to see this leads to 






*2 + 2 



where a\ is the variance of the number of offspring genes from each (diploid) 
individual. It is therefore necessary to extend the definition of a Cannings 
model to the diploid case. We define a diploid Cannings model as one for 
which the concept of exchangeability given in Section 3.2 relates to monoe- 
cious diploid individuals. We also assume that the gene transmitted by any 
individual to any offspring is equally likely to be each of the two genes in 
that individual, is independent of the gene(s) transmitted by this individ- 
ual to any other offspring, and is also independent of the genes transmitted 
by any other individual. With these conventions it can be shown that 



°\ + 2 






4 



(3.116) 



where a 2 is the Cannings model gene “offspring number” variance, and from 
this it follows that the expressions in (3.110) and (3.115) are identical. 

We turn next to the second class of models where a definition of effec- 



tive population size is useful, namely those Wright-Fisher models which 
attempt to incorporate biological complexity more than does the simple 
Wright-Fisher model (1.48). 

The first model considered allows for the existence of two sexes. Suppose 
in any generation there are N\ diploid males and N 2 diploid females, with 
N\ + N 2 — N. The model assumes that the genetic make-up of each indi- 
vidual in the daughter generation is found by drawing one gene at random, 
with replacement, from the male pool of genes, and similarly one gene with 
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replacement from the female pool. If X\ (t) represents the number of A\ 
genes among males in generation t and Ah(£) the corresponding number 
among females, then X\ (t + 1) can be represented in the form 

■X"i(t + l) = i(t-bl)+j(f + l), (3.117) 

where i(t + 1) has a binomial distribution with parameter X\(t) / (2N\) 
and index Ah, and j(t + 1) has a binomial distribution with parameter 
Ah(£)/(2Ah) and index Ah- A similar remark applies to Ah(£ + 1), where 
now the index is N 2 rather than N\. Evidently the pair {Xi (£), Ah(£)} is 
Markovian, and there will exist a transition matrix whose leading nonunit 
eigenvalue we require to find so that we can calculate . 

To do this we use the theory of Appendix A. It is necessary to find some 
function Y(X 1 ^X 2 ) which is zero in the absorbing states of the system, 
positive otherwise, and for which 

E[Y{X 1 (t + l),X 2 (t+l)} | Xi(t),X 2 {t)] = XY(t) (3.118) 

for some constant A. Such a function always exists, but some trial and error 
is usually necessary to find it. In the present case it is found, after much 
labor, that a suitable function is 

Y(X 1 ,X 2 ) = \C{X\{2Ni - X 1 )(2AT 1 )- 2 + X 2 (2N 2 - X 2 )(2 N 2 )~ 2 } 

+ {1 - (Xi ~ V0(X 2 - N 2 )Nf 1 N^~ 1 }, (3.119) 

where 

C = |{1 + (1 - 2 ATf 1 - 2jV^~ 1 ) 1 / 2 }. 

With this definition the eigenvalue A becomes 

A = |[1 - (4ATJ)- 1 - ( 4 X 2 ) _1 + (1 + X 2 (4AT 1 AT 2 )- 2 } 1 / 2 ], (3.120) 

or approximately 

A«l-(Xi+V 2 )(8X 1 Ar 2 )- 1 . (3.121) 

From this result and (3.104) it follows that to a close approximation, 

JV e (c) = 4:NiN 2 N~ 1 . (3.122) 

If N\ = N2 (= ^AT), then ~ N , as we might expect, while if Ah is 

very small and Ah is large, ~ 4A r i. This latter value is sometimes of 

use in certain animal breeding programs. 

The inbreeding population size is found much more readily. Two genes 
taken at random in any generation will have identical parent genes if both 
are descended from the same “male” gene or both from the same “female” 
gene. The probability of identical parentage is thus 

71-2 = ^ 27V - 1 + (2Ah) -1 }, 
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and from this it follows that 

AT e (i) = (27T2) -1 « 4NiN 2 N~ 1 . (3.123) 

The variance effective population size cannot be found so readily, and 
indeed strictly it is impossible to use (3.106) to find such a quantity, since 
an equation of this form does not exist in the two- sex case we consider. The 
fraction of A\ genes is not a Markovian variable and in particular, using 
the notation of (3.106), the variance of x(t + 1) cannot be given in terms 
of x(t) alone. This indicates a real deficiency in this mode of definition 
of effective population size. On the other hand, we shall see in the next 
chapter that sometimes a “quasi-Markovian” variable exists in terms of 
which a generalized expression for the variance effective population size 
may be defined. In the present case the weighted fraction of A\ genes, 
defined as 



x{t)=X 1 {t)/{4N 1 ) + X 2 (t)/{4N 2 ) 

has the required quasi-Markovian properties, and 

var {x(t + 1) | x(t)} = x{t){ 1 - x{t)}N{%N l N 2 )- 1 + 0(lVf 2 , jV 2 “ 2 ). 

From this a generalized variance effective population size may be defined, 
in conjunction with (3.106), as 

jV^ =4N 1 N 2 N- 1 . (3.124) 

Thus for this model, ss , although strict equality does not 

hold for any of these relations. 

We return now to the case of a monoecious population and consider 
complications due to geographical structure. A simplified model for this 
situation which, despite its obvious biological unreality, is useful in reveal- 
ing the effect of population subdivision, has been given by Moran (1962). It 
is supposed that the total population, of size N(H +1), is subdivided into 
H + 1 sub-populations each of size iV, and that in each generation K genes 
chosen at random migrate from subpopulation i to subpopulation j for all i, 
j (i j). Suppose that in subpopulation i there are Xi(t) A\ genes in gener- 
ation t. There is no single Markovian variable describing the behavior of the 
total population, but the quantities Xi(t) are jointly Markovian, and to find 
Ne^ it is necessary to find some function Y(t) = Y{Xi(t ), . . . ,Xh+i ( t)} 
obeying the requirements of Appendix A. It is found, after some trial and 
error, that a suitable function Y(t) is 

Y(t) = [A-D + {(A-D) 2 + 4 BC} 1 ' 2 ] Y, Xi(t){2N - X^t)} 

+ 2B -*;(*)}, 



(3.125) 
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where 



A = (4 TV 2 + H 2 K 2 + K 2 H -2 TV - 4NKH)/4N 2 , 

B = (4 KN - K 2 H - K 2 )/(4N 2 ), 

C = (4HKN - K 2 H 2 - K 2 H)/(4N 2 ), 

D = (47V 2 + HK 2 + K 2 - 4HK)/{4N 2 ). 

With this definition of Y(t), the eigenvalue A satisfying 

E{Y(t + 1) | X 1 (t ), . . . , X H+1 (t)} = A Y(t) 
is 

A = ±(A + D + {(A- D) 2 + 4BC} 1/2 ). (3.126) 

If small-order terms are ignored, this yields eventually 

N { e e) « N{H + 1){1 + (2 K(H + l))}" 1 } (3.127) 

for large H and K. This equation is in fact accurate to within 10% even 
for H = K = 1, and it thus reveals that population subdivision leads to 
only a slight increase in the eigenvalue effective population size compared 
to the value N(H + 1) obtaining with no subdivision. 

The inbreeding effective population size can be found most effi- 
ciently by noting that it is independent of iT, since the act of migration 
is irrelevant to the computation of its numerical value. Thus immediately 
from (3.110) 

7 V« = { N(H + 1) - A}/{1 - (27V)- 1 }, (3.128) 

since each gene produces a number of offspring according to a binomial 
distribution with index 2N and parameter (27V) -1 . This value clearly differs 
only trivially from the true population size N(H + 1) and, for small H and 
K , it differs slightly from 

Because of these two results, one may be tempted to ignore geographical 
sub-division in modeling evolutionary population genetic processes. 

The computation of TV ^ is beset with substantial difficulties since 
there exists no scalar Markovian variable for the model. Indeed, unless 
migration rates are of a large order of magnitude, there is not even a “quasi- 
Mar kovian” variable. Because of this no satisfactory value for has yet 
been put forward for the geographical structure case. 

We consider finally a population whose size assumes cyclically the se- 
quence of values TVi , TV 2 , TV 3 , . . . , N & , TVi , TV 2 , There is no unique value 

of 7Ve (e) , 7V e (i) or in this case, and it is convenient to extend our pre- 
vious definition to cover k consecutive generations of the process. If the 
population size in generation t + k is TV*, it is easy to see that if X(t) is the 
number of A\ genes in generation 7, and in each generation reproduction 
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occurs according to the model (1.48), 



k 

E[X(t + k){2Ni - X(t + k )} | X(t)] = X(t){2Ni - X{t)} f[{l - (2N i )~ 1 }. 

Z— 1 



Defining now by the equation 

k 

{l-(2iV( e ))- 1 } fc = f]{l-(27V i )- 1 }, 

Z— 1 

it is clear that if k is small and the N{ large, 

AT e (e) « k{N-' + • • • + N- 1 }- 1 . (3.129) 

Thus the eigenvalue effective population size is effectively the harmonic 
mean of the various population sizes taken during the ^-generation cycle. 
A parallel formula holds for , although here it is easier to work through 
the probability Q(t + k) that two genes in generation t + k do not have the 
same ancestor in generation t. Clearly 

Q(t + k) = { 1 - (2N i -i)~ 1 }Q(t + k- 1), 

and iteration over k generations gives 

k 

Q(t + k) = l[{l-(2N i )- 1 }Q(t). 

i= 1 

(z) 

Elementary calculations now show that Ne J is also essentially equal to the 
harmonic mean of the various population sizes. Again, if x(t) is the fraction 
of A i genes in generation t, 

var {x(t + k) | s(i)} = ±k{N 1 - 1 +N 2 - 1 + --- + N^ 1 }x(t){l-x(t)} + 0(N- 2 ). 

(v) 

This shows that to a suitable approximation, Ne is also the harmonic 
mean of the various population sizes. 

This conclusion has been generalized by Karlin (1968). Karlin assumed 
that in any generation the population size takes one or other of the values 
Ah, AT 2 , A r 3 , . . . , N m according to Markov chain rules, so that there exists a 
probability qij of a transition from a population of size Ni to a population 
of size Nj. The cyclic case just considered arises if qi,i+i = 1 for i = 

(e) 

1, 2, . . . , m — 1, q m i = 1. The leading nonunit eigenvalue, and hence Ne , 
depends on the transition matrix {%} as well as the particular form for 
f(s) assumed in (3.41). Explicit effective population size values are hard 
to achieve in general, but in all cases for which expressions can be found, 
Ne is close to the weighted harmonic mean of the possible population 
sizes. Thus if {/&} is assumed to be a Poisson distribution, leading to a 
generalized binomial transition probability extending (1.48), and if qij — qj 
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for all i, the leading nonunit eigenvalue is 
so that 

^i e) = [E 

So far we have ignored the effects of age structure. The definition and 
calculation of the variance effective population size in the age-structure 
case is given by Poliak (2000). 

We conclude with some general observations. First, we have only consid- 
ered one source of complexity at a time in computing effective population 
sizes. While the theory is no doubt very complex, it is reasonable to 
hope that the effective population size in, for example, a geographically 
subdivided population admitting two sexes would be given by a natural 
composition of the effective sizes for the subdivision and the two-sex cases 
respectively. 

Second, many papers and several textbooks use what appears to be a 
definition of Ne defined by the outcome of a given experiment or by a 
given field observation. Thus, for example, in the (diploid) formula (3.115) 
the symbol cr^ is sometimes replaced by V, where V is defined by 

V = - 2 f/N, 

where n* is the (random) number of genes produced by the ith individual. 
While such a definition might be of use in a retrospective analysis of a 
given experiment, it is not allowable for theoretical purposes since V is a 
random variable and thus can take quite different values for two different 
populations that have identical properties and thus must have the same 
value of Ne l \ 

Third, all three effective population sizes suffer some defects. Thus, 
and are defined assuming two alleles at the locus in question, ivj^ is 
not defined in terms of allelic type and is thus possibly superior to 
and Ne as a pure measure of population structure although, as we note 
in a moment, each expression has its special interpretation and usefulness. 
Further is not of much value in characterizing various properties of 
the geographically structured case. 

Fourth, although the three effective sizes are often nearly equal in the 
examples considered above, they can in other cases differ substantially. 
This occurs particularly in populations with nonconstant size. An extreme 
example arises when a single heterozygote in generation t gives rise to a 
very large number of offspring in generation t- hi. Here both and N j e ^ 

are very large, but iV e ^ is unity. Thus and tend to be defined in 
terms of the future evolution of the population, whereas is concerned 
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more with its past. In some circumstances one effective size is of most 
interest and for other circumstances, another. 

Fifth, we have not defined an effective population size for continuous- 
time models. There is no reason to believe that these differ significantly 
from those given above. Some specific formulae are given by Felsenstein 
(1971), Hill (1972) and Kimura and Crow (1972). Nor have we considered 
the complications that can occur when fertility parameters are inherited 
(see, for example, Nei, (1966)), or in a variety of other situations. 

Sixth, problems arise with the definition of an effective population size in 
cases such as the human population when the population size has steadily 
increased. None of the definitions of the effective population size given 
above handles this situation in a satisfactory way. Given the current focus 
on the evolution of the human population, this is particularly unfortunate. 

Seventh, we recall what is perhaps our main motive in defining an 
effective population size, namely to consider whether various complex pop- 
ulation structures can lead to a significantly increased importance for 
random drift compared to its importance in the model (1.48). Our con- 
clusions show that this occurs when there is extremely large variance in 
offspring number, when the population size is cyclic and the smallest size 
the population assumes during the cycle is very small and when, in a 
dioecious population, the number of breeding individuals in one sex is 
very small. In all other cases, particularly in the case of geographically 
subdivided populations, there appears to be little significant scope for 
random drift beyond that applying for the model (1.48). We discuss the 
consequences of some of these observations later in this book. 

Finally, the concept of the effective population size is widely misused in 
the literature, especially in areas outside of, but associated with, evolu- 
tionary genetics. In particular, its connection to the simple Wright-Fisher 
model (1.48) seems to be widely unknown. The effective population size of 
some population is no more than the size of a simple Wright-Fisher model 
population having some characteristic in common with the population of 
interest. The effective sizes for two different characteristics (for example, 
variance, inbreeding) might very well differ, so that the purpose for using 
the effective population size concept is relevant. The value of the concept is 
that calculations applying for the simple Wright-Fisher model, especially 
diffusion theory calculations, can sometimes, and for some specific purpose, 
be used to provide results for the population of interest, replacing N wher- 
ever it appears by the appropriate N e . (Even this claim is essentially a 
heuristic one, and the theory for using Wright-Fisher model formulas for 
non- Wright-Fisher models, with this substitution, is not well developed.) 
In any event, since the model (1.48) provides at best a rough approxima- 
tion to reality, the implication that however it is calculated, the effective 
size bears any necessary similarity to an actual population size is without 
any foundation. 
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3.8 Frequency-Dependent Selection 

There is no requirement that the fitness values Wij in the model defined 
by (3.16) and (3.29) should be fixed constants, although so far we have 
assumed that they are. The analysis we have carried out, in particular the 
derivation of (3.30), continues to be valid when the Wij, (or s and h in 
(3.30)) are functions of x. This can lead to some interesting consequences. 
Thus for small t the fitness scheme 

wn = 1 + t(l ~ xu = 1, w 22 = 1 + \tx, 

is equivalent to (1.25b) if we put h = \x/(l — x). Using this value in (3.30) 
gives Tt{x) = x. Thus survival probabilities for this frequency-dependent 
fitness scheme are the same as those obtaining when there are no fitness 
differentials. 



3.9 Two Loci 

In this section we consider different two two- locus Markov chain analogues 
of the one-locus model (1.48). While a good deal of progress on the prob- 
lems considered in this section is possible using diffusion theory (Ohta and 
Kimura (1969a, b), Littler (1973)), we defer consideration of a diffusion 
analysis to Section 6.6, and consider here only results found from Markov 
chain theory. 

The first of the two-locus models that we consider is the “random union 
of zygotes” model of Kimura (1963), Watterson (1970), Serant and Villard 
(1972) and Littler (1973), and the second is the “random union of gametes” 
model of Karlin and McGregor (1968) and Hill and Robertson (1966, 1968). 
For convenience we call these here the RUZ and RUG models, respectively. 
A general theory of Weir and Cockerham (1974) yields many results for 
both models. We follow as far as possible the notation of Section 2.10 in 
discussing these models. 

The RUZ model is defined as follows. Suppose a population of fixed size 
N contains, in generation t, Xij(t) individuals whose genotype is made up 
of gametic types i and j ( i < j) (for a definition of a gamete of type z, see 
Section 2.10). Let be the probability that a gamete produced by such 
an individual is of type k. When (z, j) =(1,4) or (2,3), these probabilities 
will involve the recombination fraction R. Then 

Cfc(t) = 'Y^2 X ij( t ) a ijk N ~ 1 

i<j 

is the probability that a gamete chosen at random forming generation t+1 is 
of type k. It is now assumed that the N individuals in generation t + 1 have 
their gametes determined by 2 N independent trials, with the probability 
of a gamete of type k on any trial being Ck(t). 
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The values Xij(t + 1) are thus determined from the Xij(t) only through 
the quantities Ck(t). It follows that the random vector (ci, C2, C3, C4) evolves 
as a Markov chain. The transition matrix of this chain can be found from 
that of the Xij, and is perhaps best written down in terms of the joint 
moment-generating function 



E(- 



^^(* + i) | Ci (t)) = 

i= 1 j = 1 



a ijk9k/N jy 



(3.130) 



This equation was given by Watterson (1970) and requires the biologically 
reasonable definition = ajik- 

Before examining the consequences of (3.130) we introduce the RUG 
model. Here we ignore the zygote stage in our formation of the model and 
simply assume, following (2.84), that if generation t produces rii(t) gametes 
of type i ( i = 1,...,4), the probability that generation t + 1 produces 
rii(t + 1) gametes of type i (i = 1, . . . , 4) is 



4 (2W)! nc»'>, (3.13D 

J1 ni(t + 1)! * =1 

1= 1 



where 



ipi = Ci{t) + 'q i R{c l {t)c i {t) - c 2 {t)c 3 {t)}, a(t) = m{t)/(2N), 

and the rji have been defined in (2.84). The models defined by (3.130) and 
(3.131) are not equivalent in general, but are so in the limiting case R = 0. 

The qualitative behaviors of both models are identical. All four gametes 
will segregate in the population, with the possibility that one gamete is 
temporarily absent not excluded, until after a random time whose distri- 
bution is determined by R, N and the initial gamete frequencies, one or 
other allele is lost from the population. From this time on segregation con- 
tinues at one locus only and the one-locus model (1.48) applies. Eventually 
one or other allele at this locus is lost, and the population consists entirely 
of one of the four gametic types. Clearly questions of particular interest 
concern the time that all four gametes exist in the population, the prob- 
ability that a nominated gamete is eventually fixed and, because linkage 
disequilibrium is of major interest in two-locus systems, the transient be- 
havior of the coefficient of linkage disequilibrium, defined in generation t 

by 

D(t) = C\ (t)c 4 (t) - c 2 (t)c 3 (t). (3.132) 

We now consider three quantities in order to obtain further information 
about the properties of both models. The first is the eigenvalue /i for which, 
for large £, 

P rob (segregation continues at both loci) ~ cf / . 



(3.133) 
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The second is the probability of ultimate fixation of gamete type i, and the 
third is the mean E {D(t)} and mean square E{D(t)} 2 of the coefficient of 
linkage disequilibrium at time t. 

To find the eigenvalue y defined by (3.132) we follow the approach of 
Watterson (1970, 1972). We consider the variable D(t), defined by (3.132), 
as well as the variables S(t) and Z(£), defined by 

5(f) = ci{t)c 4 {t) + c 2 (t)c 3 (t), 

Z (t) = {ci(t) + c 2 (t)}{a(t) + c 3 (t)}{c 2 (t) + c 4 (t)}{c 3 {t) + c 4 {t)}. 

It is then found that conditional on ci(£), C 2 (£), cs(t), 

E{5(£ + 1)} = an S(t) ~fi &i2{D{f)} 2 + a i3 Z(t), 

E {D(t + l)} 2 = d2\S{t) -fi a22{7^(£)} 2 T d2% Z(t), 

E {Z(t -fi 1)} = U 31 S(t) -fi a,23{D(t)} 2 + d23 Z(t). 

Here the a^- are constants whose values depend only on N and R. It is 
clear that given the initial values q(0), 

/ S(t) \ ( 5(0) \ 

E {D(t)} 2 \=A t [ {D( 0)} 2 , (3.134) 

V m ) V z (°) / 

where A = {a^ }. Since S(t) is positive if and only if all four alleles continue 
to segregate in generation £, a generalization of the method of Appendix A 
shows that the leading nonunit eigenvalue of the transition matrix implied 
by (3.134) is the leading eigenvalue of A. This may be calculated either 
algebraically or numerically. Watterson found that the largest eigenvalue fi 
can be written in the form 

where y is the solution of a certain cubic equation (see Watterson (1970, 
1972b), Littler (1973)). 

The most useful discussion of the properties of fi is through numerical 
examples, but some limiting cases are of special interest. Thus R — 0 
implies y — 0 and hence 

= (2 TV)' 1 . 

This is to be expected since for R = 0 the model is equivalent to a one-locus 
model with four alleles for which this eigenvalue has already been estab- 
lished above. Perhaps of more importance for consideration of properties 
of two-locus systems is to consider the behavior of fi for R moderate and 
N large. Here Watterson (1970, 1972) found that 

/i = {l — (2 N)- 1 } 2 + 0(N~ 3 ). (3.135) 

The value of / 1 is very close to {1 — (2N)~ 1 } 2 when R is not extremely small 
and N greater than about 50. Some numerical values showing this, taken 
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Table 3.1. Values of the eigenvalue fi (see text for definition) in both RUG and 
RUZ models 









R 






0.01 


0.10 


0.20 


0.50 




10 


0.941 


0.910 


0.905 


0.903 


N 


25 


0.972 


0.961 


0.961 


0.960 




50 


0.984 


0.980 


0.980 


0.980 



from Littler (1973), are given in Table 3.1. The reason why the eigenvalue 
/i takes the form shown in (3.135) is obvious enough. When N and R 
are not both small the two loci behave almost independently, so that the 
probability that segregation continues at both loci is close to the square of 
the probability that segregation continues at any one locus. As R —> 0 the 
two segregation behaviors become more dependent. 

A parallel evaluation of ji for the RUG model has been made by Littler 
(1973). Here Littler sets up equations of the form 

E{j D(t T l)} 2 | Ci(t)] = bn{D(t)} 2 + bi2l(t) + bi 3 Z(t), 

E [I(t + 1) | Ci(t)] = b2\{D{t)} 2 + &22^(£) + b23Z(t), 

E [Z(t + 1) | Ci(t)\ = bsi{D(t)} 2 + &23^(£) + b 33 Z(t). 

where 

I(t) = {a (t)c 4 (t) - c 2 {t)c 3 (t)}{ 1 - 2ci (t) - 2c 2 (t)}{l - 2ci - 2 c 3 (£)}, 

Z(t) is as defined above and the bij are constants depending only on N and 
R. This gives, conditional on the values Ci(0), 

({D(t)} 2 \ f{D( 0)} 2 \ 

E I(t) )=B t \ 1(0) , (3.136) 

V m i \ m J 

where B = {bij}. This is analogous to (3.134), and the leading eigenvalue 
(i for this model is the leading eigenvalue of the matrix B. 

As for the RUZ model, fi decreases as a function of R from 1 - (2iV) -1 
at R = 0, but is always greater than {1 — (27V) -1 } 2 for R < 0.5 (Karlin 
and McGregor (1968)). For the combinations of N and R values listed in 
Table 3.1, the numerical values given apply to the order of accuracy shown 
for the RUG model also, although the formulas for the eigenvalues in the 
two stochastic models are different. For values of N and R not listed the 
agreement between the two is not quite so close. Nevertheless, the general 
discussion given for the values of fi in the RUZ model also apply here for 
the RUG model. 

We now turn to probabilities of fixation for the various gametes. 
These were found by Kimura (1963) for the RUZ model and by Karlin 
and McGregor (1968) for the RUG model. Suppose we can find func- 
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tions 0z{ci(t), c 2 (t), C 3 (t), 04 ( 1 )}, which we abbreviate to ^(t), having the 
property that 



E{0i(£ + 1) | ci(f),c 2 (0,c 3 (f),c 4 (f)} = 4>i(t), i = 1, — ,4, (3.137) 

where <^( 00) = 1 if gamete i eventually fixes, <j>i(oo) = 0 otherwise. Then 
by iteration in (3.137), 

E{0i(oo) | ci (0) , c 2 (0) , c 3 (0) , c 4 (0) } = (j>i{ 0). 



The left-hand side is the probability that gamete i fixes. We conclude that if 
we can find functions satisfying (3.137), gamete fixation probabilities 
are given by the values of <^( 0). Functions 0^(t) satisfying (3.137) always 
exist, and can usually be found after some trial and error. 

One complication arises with this procedure. The RUZ model concerns 
zygotes rather than gametes and the initial composition of the population 
then relates to zygote frequencies rather than gamete frequencies. Never- 
theless the essence of the above procedure still applies. For the RUZ model 
it is found that functions <j>i(t) satisfying (3.137) are 



= Ci(t) + 



r]i2NRD(t) 

2NR+1 



while for the RUG model, 



= Ci{t) + 



rji2NRD(t) 
2NR + 1 - R ‘ 



It follows immediately for the RUG model that the probability of fixation 
of gametes of type i is 



Ci(0) + 



th2NRD{{$) 
2NR + 1 -R' 



(3.138) 



Matters are slightly more complicated for the RUZ model since we 
must give probabilities in terms of initial zygotic frequencies. It is found 
(Watterson (1970)) that the required value is 



* r]i2NRD* + A( 2 N )~ 1 
Ci + 2NR + 1 

where c* is the frequency of the gamete i among the zygotes of the initial 
generation, D * = c\c\ - and A = {Xu( 0 ) - X 2 3(0)}(2iV) _1 . 

In both cases, whenever R is fixed and moderate and N is large, the 
fixation probability in gamete i is approximately q -\-rjiD , where q stands 
for c* in the RUZ model and q( 0) in the RUG model. But this is just the 
initial value of the product of the frequencies of the two alleles making up 
the ith gamete. Thus, to a close approximation, the probability in this case 
that any gamete fixes is simply the product of the probabilities that the two 
corresponding alleles fix. This arises because the segregation processes at 
the two loci are effectively independent. When R is small this is no longer 
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true, and the association between the two loci must be taken into account 
when computing fixation probabilities. 

We turn finally to the behavior of the linkage disequilibrium function 
D(t). A considerable part of two- locus theory relates to this quantity, so 
it is important to discuss its behavior in detail for the two models under 
consideration. 

Consider first the model (3.130). It is easy to show for this model that 

E {D(t + 1) | D(t)} = {1 - (2N)- 1 - R}D(t ), (3.139) 

and hence the mean value of D(t) decreases to zero geometrically fast with 
t at rate 1 — (2N)~ l — R. Unless R is close to zero this is quite a rapid rate. 
A parallel remark applies for the model (3.131), where we find 

E{D(t + 1) | D(t)} = {1 - (2AT) _1 }{1 - R}D(t). 

The convergence to zero of E{D(t)} is only slightly slower for this model 
than that for the model in (3.130). 

More detailed information about the behavior of D(t) will depend on 
knowledge of the variance of £>(t), or equivalently, since the above expres- 
sions easily yield the mean of D(t ), on the expected mean square E{D(t)} 2 . 
Suppose that the initial value of D is zero. Since D — 0 once fixation of 
one or other allele occurs, we might expect that in this case the variance 
of D will increase from zero as t increases and, after achieving a maximum 
for intermediate values of £, decrease again to zero. If initially D is nonzero 
we might perhaps expect the variance of D monotonically to decrease to 
zero. 

Although the behavior of the variance of D is by no means simple, 
these expectations are in essence confirmed for the RUG model by Hill 
and Robertson (1968) and for both models by Littler (1973). Equations 
(3.134) and (3.136) show that E{D(t)} 2 can be written in the form 

E {D{t)f = aui\ + a 2 6\ + a 3 ^, 

for the RUZ model and 

E{7)(£)} 2 = Pi /4 + P 2 p\ + P 3 P 21 

for the RUG model. Here the a*, Pi and 6i are constants depending on 
N and R and /xi, ^2 are the leading eigenvalues of A and B , respectively. 
While for large t the behavior of E{D(t)} 2 in both cases is determined 
largely by the maximum eigenvalue ^x, the behavior for small t is quite 
complicated and all eigenvalues and eigenvectors are needed to describe 
it. To gain more information about the transient behavior of D(t), Littler 
(1973) investigated the behavior of E (D(t)) 2 as a function of t. When D is 
initially zero, the variance of D increases to a maximum value of order 0.01 
and then decreases. The maximum is reached sooner and is slightly greater 
for small values of N. For large £, the variance of D is smallest for small 
values of N. When D is not zero initially, E{D(t)} 2 decreases with t, and 
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for large t is minimized for small values of N. The eigenvalue fi determining 
the ultimate behavior of E {D(t)} 2 is much closer to unity than the value 
1 — ( 2N )~ 1 — R implied by (3.139) for the ultimate behavior of E {D(t)}. 
Because of this it has sometimes been asserted that observed values of D 
can differ significantly from the mean. In view of the above results this 
conclusion may not be drawn, and it is clear that the eigenvalues on their 
own do not give a complete picture of the true behavior of D(t). 

In the RUG model, E{D(t)} 2 can be found by a procedure similar to that 
used in the RUZ model. Numerical examples have been given by Hill and 
Robertson (1968). They found that if all gametes have initial frequency 
0.25 and NR = 1, var{.D(t)} reaches a maximum of 0.006 at about N 
generations, while when NR = 4 a maximum of 0.003 is reached after about 
N generations. In general the larger NR , the smaller the maximum and the 
sooner it is reached. For large populations and unlinked loci, var {D(t)} is 
always extremely small, indicating that by random effects only, it is unlikely 
that D will ever assume a large value. 

All the above has assumed that there is no mutation, so that eventual 
fixation of one or other gamete is certain. If mutation exists at positive rate 
from Ai to Aj (i ^ j) and Bi to Bj ( i ^ j) there will exist a stationary 
distribution of gamete frequencies and thus a stationary distribution of D. 
Since we are interested in the extent of likely variation of D from zero, 
we consider now the stationary mean value of D 2 in this mutation case, 
following the analysis of Ohta and Kimura (1969b). 

Suppose that the mutation rates are u\ (from A\ to A 2 ), V\ (from A 2 to 
Ai), U 2 (from B\ to B 2 ) and V 2 (from B 2 to £?i). We consider the three 
quantities Z(t), I(t) and {D(t)} 2 introduced above. Ohta and Kimura set 
up a recurrence relation similar to that given above for the case without 
mutation, where the coefficients bij now include mutation terms. By letting 
t — y 00 , nondegenerate limits are found for the expectations of these quan- 
tities and in particular for E{£>(oo)} 2 . We defer giving an explicit formula 
here since a slightly simpler formula will be given in Section 6.6, based 
upon diffusion theory. 




4 

Diffusion Theory 



4.1 Introduction 

In the previous chapter we encountered some difficulty in deriving explicit 
formulas for several quantities of evolutionary interest, particularly when 
the population behavior was described by the Wright-Fisher model (1.48) 
or any of its generalizations. Even for models such as (3.30), where ex- 
plicit formulas can often be found, the effects of the genetic parameters 
are sometimes obscured by the complexities of the expressions that arise. 
For both these reasons, it would be most useful to us if we could find ap- 
proximate formulae for these quantities by reasonably accurate expressions 
which are not only comparatively simple, but which also display explicitly 
the effects of the various genetic parameters involved. Fortunately there 
exists a general approach which very often does all this for us, namely in 
approximating the discrete process by a continuous-time continuous-space 
diffusion process. 

A substantial and mathematically deep theory of diffusion processes ex- 
ists. We outline those aspects of this theory that are of use to us in Section 
4.7. Our approach to diffusion processes does not, however, proceed through 
this theory, being often rather intuitive and avoiding theoretical niceties. 
This is in part because for us the fundamental process is always a discrete 
one, usually a finite Markov chain, for which some of these niceties are 
irrelevant, and in part because the mathematical depth of formal diffusion 
theory is inappropriate to the level of this book. We shall in particular 
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assume without question the existence of a unique diffusion process having 
certain properties that we require. 

Diffusion theory has a long and honorable place in population genetics 
theory, going back to Fisher (1922). In this chapter we consider the elements 
of the theory divorced from specific genet ical applications, and in Chapter 
5 the theory developed here will be applied to a variety of genet ical models. 



4.2 The Forward and Backward Kolmogorov 
Equations 

We consider a discrete Markov chain with state space {0, 1, 2, . . . , M}, tran- 
sition matrix P = {pij} and initial value k for the random variable whose 
properties are described by this Markov chain: This notation will be used 
throughout this chapter. For convenience we write as /(i; fc, £), so that 

f{j-,k,t + 1) = 'Y^f{i-,k,t)p i j. (4.1) 

i 

We re-scale the space axis by a factor M~ l and consider the new variables 

x — iM ~ l , x + 5x=jM~ 1 , (4.2) 

and write p = kM ~~ l . In all applications of interest to us, E(fe|x) = 
0(M ~ 7 ) and var(5x|x) = 0(M~ 7 ), where 7 = 1 or 2 ; now change the 
time scale so that possible changes in the random variable can occur at 
time points 8t, 2 St, 3 5t , . . ., where 5t = M -7 . The re-scaled process is of 
course essentially identical to the original process and in particular is still a 
discrete process. Nevertheless we feel that as M — > 00 the process converges 
in some way to a continuous-time continuous- space diffusion process, and 
our aim is to identify this diffusion process and to discover some of its 
properties. 

Suppose that in the discrete process the moments of the change fe, given 
the current value x at time £, satisfy the equations 

E(fe) = a(x)St + o(5t), (4.3) 

var(fe) = b(x)5t + o(5t), (4.4) 

E(|fa| 3 ) = o(St). (4.5) 

Here a(x) and b(x) are assumed to be functions of x but not of t. We write 
(4.1) in the form 

f(x + Sx;p, t + Si) — j f(x;p , t)f(x + Sx ; x, 5t) dx, 

where here and below all integrals have terminals 0 and 1. We now formally 
expand on both sides as Taylor series in 5t and Sx. Using (4.3) - (4.5) and 
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retaining leading terms only, we eventually arrive at the equation 

^ = —j^{a(x)f(x;t)} + \-^{b{x)f(x-,t)}. ( 4 . 6 ) 

This is the forward Kolmogorov (Fokker-Planck or diffusion) equation and 
is of fundamental importance in the theory of population genetics. This 
formal procedure can be justified by the mathematical theory referred to 
briefly in Section 4.7. 

Since small St — » 0 corresponds to large M, we now assume that there 
exists a diffusion process on [0, 1] that satisfies (4.3)-(4.5) and possesses 
a density function f(x;t) which satisfies (4.6). We expect this process to 
approximate the original discrete process in the sense that for 0 < g < h < 
1 , 

h 

J f(x;t)dx ( 4 . 7 ) 

9 

provides a good approximation to the probability that the original unsealed 
discrete random variable is between Mg and Mh at time M 1 t. 

In the procedure leading to (4.6), little mention was been made of the 
initial value p of the diffusion variable, and p does not appear explicitly in 
(4.6). However, the function f(x;t) should be written more fully f(x\p,t), 
since the solution of the equation depends on the value of p. There is, how- 
ever, a second equation that makes a more explicit and indeed fundamental 
use of the value of p. If we consider instead of the time points (0, t, t 4- St) 
the new time points (0, St, t + St), we arrive at the equation 

f(x;p,t + St) = J g(6p;p)f(x;p + 6p,t)d(5p). ( 4 . 8 ) 

Here 5p is the change in the value of the random variable in the time 
interval (0, Jt) and g{Sp;p), its probability density function. Expanding the 
integrand as above and retaining leading terms, we arrive at the equation 

a/(x ’ p - t) = a (P) a,{ T t] + ■ («> 

dt dp z dp 1 

This is the backward Kolmogorov equation, which for several purposes is 
more useful than the forward equation (4.6). 

Some care must be exercised in the interpretation of (4.9). As stated 
above, the density function f(x;p,t) depends on p, and all that is claimed 
is that as a function of p, this density function satisfies (4.9). The statement 
sometimes made that (4.9) implies a time reversal and that p is a random 
variable with x fixed is incorrect: The random variable in (4.9) is the current 
gene frequency x. 

An explicit solution of (4.6), or of (4.9), can sometimes be achieved, as 
we see in the next chapter. The solution is usually of the eigenfunction 
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expansion form 



oo 

= ^2gi(x,p)exp(-\it), (4.10) 

i — 1 

where the A* (0 < Ai < A2 < A3 • • • ) are eigenvalue constants and the 
9i( x >p)i the associated eigenfunctions. This form of solution is clearly anal- 
ogous to (2.138), a parallel we examine in more detail in particular cases. 
Remarkably, a considerable amount of information concerning the diffusion 
process (4.6) can be found without computing the explicit solution (4.10), 
as we now see. 



4.3 Fixation Probabilities 

In this and the next three sections we assume without question the existence 
of a diffusion process on [0, 1] satisfying (4.3)-(4.5) and admitting a density 
function satisfying (4.6) and (4.9). 

An equation parallel to (4.9) can be found by replacing f(x\p,t) by 
F(x;p,t) throughout, where 



F(x;p, t) = J f(y;p, t) dy, (4.11) 

0 



so that 



gfeM = a( p)gSgM + ( > . (4.12) 

Suppose now that both x = 0 and x = 1 are absorbing states of the diffusion 
process. From (4.12) we arrive at the equation 



dP 0 (p;t) dP 0 {p;t) x d 2 P 0 (p;t) 

-ar - = aip) -w- + 



( 4 . 13 ) 



where Pq (p; t) is the probability that absorption has occurred at x = 0 at 
or before time t. The same equation holds for the probability P\{p\t) that 
absorption has occurred at x = 1 at or before time t. Although Po(p] t) and 
Pi (p; t ) obey the same equation, their values differ due to different bound- 
ary conditions. By letting t oo, the probability Po(p) that absorption 
ever occurs at x = 0 satisfies the equation 



0 = a(p) 



dP 0 (p) 

dp 



+ k b (p) 



d 2 P -\(p) 
dp 2 



(4.14) 
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Since Pq(p) clearly satisfies the boundary conditions Pq( 0) = 1, Po(l) = 0, 
it is straightforward to solve (4.14) explicitly to get 

l l 

Po(p) = J ip{y)dy/ j ip{y)dy, (4.15) 

p 0 



where 



y 

ip(y) = exp(— 2 J {a{z)/b{z)} dz). (4.16) 

Similarly the probability P\ (p) that absorption eventually occurs at x = 1 
is found to be 



v i 

Pi(p) = J ‘4>(y)dy/ J ip{y)dy. (4.17) 

0 0 

We have already found these formulas as approximations to the values 
in a finite Markov chain in (3.30) and (3.31), where a different notation 
was used, and without reference to diffusion processes. Although we have 
carried out a scaling of the time axis in passing from the original Markov 
chain to the diffusion process, there is no need to re-scale the values (4.15) 
and (4.17) when using them as approximations in the Markov chain. This 
is no longer true for questions concerning the time until absorption, as we 
now see. 



4.4 Absorption Time Properties 

We start by assuming that both x — 0 and x = 1 are absorbing barriers 
and consider the mean time until one or other boundary is reached in 
the diffusion process of interest. Equation (4.13) and the corresponding 
equation for x = 1 show that if <f>(t;p) is the density function of the time t 
until absorption occurs, then <fi{t;p) satisfies the equation 



dcj>(t;p) 



= a(p) 



d<t>(t',P) 

dp 



+ \b{p) 



d 2 <f>(t;p ) 
dp 2 



dt 



(4.18) 
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Then 



so that 



-1 = - 



oo 



J (j){t\ p) dt 

0 



= -N>(*;p)]o° + J *^ dt 
0 



-0 + /*{o(p)^ + 




dt 



-1 = a(p) 



dt(p) 

dp 



+ \Kp) 



d 2 t{p) 
dp 2 



(4.19) 



providing an interchange in the order of integration and differentiation is 
justified, that the mean fixation time is finite, and that t(j>(t;p) — > 0 as 
t oo. Here 



t(p) = J t(f>{t]p)dt (4.20) 

o 

is the mean time until one or other absorbing boundary is reached, given 
the initial frequency p. The solution of (4.19), subject to the boundary 
conditions i( 0) = t( 1) = 0, is best expressed in the form 

i 

i(p) = J t(x;p)dx, (4.21) 

o 

where 

X 

t(x;p) = 2P 0 (p)[b(x)‘ip(x)}~ 1 J rp(y) dy , 0 < x < p, (4.22) 

0 

1 

t(x\p) = 2P 1 (p)[b(x)tp(x)}~ 1 J ip(y) dy, p < x < 1. (4.23) 

X 

For the original Markov chain we approximate the mean absorption time 

by 



M 7 t(p). 



(4.24) 
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The representation (4.21) suggests a more detailed examination of the 
function t(x;p). This function has the interpretation that 

x 2 

J t(x;p)dx (4.25) 

Xl 

is the mean time in the diffusion process that the random variable spends 
in the interval (oq, £ 2 ) before absorption. Correspondingly, we approximate 
the mean number of times in the Markov chain that the discrete random 
variable takes the value j (= Mx) before absorption by 

tkj ~ M 7_1 t(£;p). (4.26) 

The representation (4.25) allows further conclusions to be drawn. Let g{x) 
be any well-behaved function of x and consider the integral I g (p) of this 
function over the time until absorption occurs. This integral is a random 
variable, since its value will depend on the actual path traced out by the 
diffusion variable, and its mean value from (4.25) is clearly 

1 

E (I g {p)) = j g(x)t(x-,p)dx. (4.27) 

0 

In a similar way, if ^2g(x) is the sum of the function g{x) in the discrete 
process, 



V(^2g(x)) & M 1 J g(x)t(x;p)dx. (4.28) 

0 



There is an alternative way of deriving (4.27) akin to the derivation 
of the backward (4.9). We note that the integral of g(p) over (0 ,St) is 
approximately g(jp)St, so that writing E[/ 5 (p)] as g(p) for convenience, 



p{p) ~ 9{p)8t + E 



" 1 

J g(x)t(x;p + Sp) dx 

_o 



g(p)5t + p{p) + a(p)St ^ + \b{p) 5 t^X + 0 (<ft). 



Dividing by 5t and letting St — » 0, we get 



a(p) 



dp(p) 

dp 



+ \Kp) 



d 2 p(p) 

dp 2 



-g(p)- 



(4.29) 



This equation generalizes (4.19) and may be solved, subject to the boundary 
conditions /i(0) = p(l) — 0, to derive (4.27). 
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It is possible to derive higher moments of the absorption time and more 
generally of I g (p). For the absorption time we have 



dt 



oc 

-2 t{p) = -2 J t<t>{t\p) 



OC 

= -[i 2 0(i;p)]S° + J 

0 

oc 

= /{” 



2 0(j) 

1 s* 



(p) 



dt 2 <t>{t\p) u d 2 t 2 (f){t]p) 



dp 



+ 2 fe (P) 



Op 2 



dt 



dS (p) 1 ,d 2 S(p) 

= + 



(4.30) 



where S(p) is the second moment of the absorption time. In this procedure 
we have formally interchanged the order of integration and differentiation. 

Equation (4.30) can be solved for S'(p), subject to the boundary con- 
ditions S( 0) = S( 1) = 0, and hence a formula for the variance of the 
absorption time can be found. Clearly this procedure can be generalized to 
find any moment of the absorption time, but the formulas become compli- 
cated and we present here only an expression for the variance cr 2 (p ). This 
is 



a 2 (p) = 4 



1 X 

Pi{p) J ip(x) J £{y)dydx 

V 

p x 

Po(p) J J t,(y)dydx 

0 



- m? 



where 'ip(x) has been defined in (4.16) and 



£(x) = [b(x)ip(x)] 1 t{x). 



(4.31) 

(4.32) 



It is also possible to find higher moments of the random variable / 5 (p). 
This has been done by Nagylaki (1974a), and we here only outline the 
method. Denoting the nth moment of this variable by (p), the successive 
moments satisfy the recurrence relation 



-<» + djov-’m = . 



(4.33) 



and the boundary condition p( n )(0) = p^(l) = 0. For n = 1 this general- 
izes (4.30), and higher moments may be found from (4.33) by iteration. In 
particular by choosing g(x) = 1, (0 < x < 1), g(x) = 0, otherwise, higher 
moments of the absorption time can be found from (4.33). 
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We now use the diffusion process to find an approximation for the dis- 
tribution of the sojourn time in any state of the Markov chain. Equation 
(2.146) shows that the distribution of the sojourn time depends on two 
parameters (there denoted by and rj), and (2.147) shows how these 
parameters are related to the mean of the sojourn time. Suppose first we 
wish to approximate the distribution of the sojourn time at j, where j > fc, 
k being the initial value. The parameter a^j is the probability that state 
{j} is ever reached and equation (4.17) is readily adapted to give 

p X 

(*hj ~ J i>(y)dy/ J ip{y) dy, (4.34) 

0 0 

where k = pM, j = xM, and the drift and diffusion coefficients a(y) and 
b(y), needed to calculate ^(y)> are associated with the diffusion process 
approximating the Markov chain. We approximate rj by using (2.147) and 
(4.26) to find 



a kj /(l - rj) « M 1 1 t{x\p). (4.35) 

Combining (4.34) and (4.35), the sojourn time distributions is given by 
(2.146) where akj is approximated by (4.34) and rj by 

X 1 

rj « 1 - IM 1 * 1 (b(x)ip(x)/ ( J dy J %!>( y ) dyjj . (4.36) 

0 o 

When j < k (2.147) continues to hold, but here we approximate a kj by 

l l 

~ J f tp{x)dx/ J %j){x)dx. (4.37) 

p X 

The approximation (4.36) remains unchanged. 

All of the above formulas require modification when there is only one 
absorbing state. We do not go into details here and only state the conclu- 
sions. If {0} is the only absorbing state (4.21) continues to hold, but t(x;p) 
must be redefined a s 



X 

t(x;p) - 2 (b(x)ip{x)) 1 J i>(y)dy, 0 < x < p, (4.38) 

o 

P 

t(x-,p) = 2 (b{x)ip{x)) 1 J ij}(y)dy, p<x< 1. (4.39) 

0 
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Similarly, when {1} is the only absorbing state, we have 

l 

J fp(y) dy, 0 < x < p, (4.40) 

P 

1 

J 4>{y) dy, p<x < 1. (4.41) 

X 

In both cases (4.24)-(4.29) hold, except that revised boundary conditions 
are needed for (4.29) to produce the solution (4.27), where now t(x]p) is 
given either by (4.38) and (4.39) or by (4.40) and (4.41). 



t{x;p) = 2 (b(x)ip(x)) 
t(x\p) = 2 (b(x)ip(x)) 



4.5 The Stationary Distribution 



We have assumed above that in the Markov chain we are interested in there 
has existed at least one absorbing state. In several cases of interest there are 
no absorbing states, and there exists a stationary distribution {<j)j} satisfy- 
ing (2.156) and given implicitly as the solution to (2.157). Since an explicit 
expression for this distribution has not been found in many examples of 
genetic interest, we aim in this section to approximate this distribution by 
finding the stationary distribution of the approximating diffusion process. 
It will turn out that this leads to a very simple form for this approximat- 
ing distribution in which the effects of the general parameters are clearly 
displayed. 

Our starting point is the forward Kolmogorov equation in (4.6). If we 
integrate throughout formally with respect to x, there results eventually 




- F(x-,t)} = a(x)f(x\t) 



1 d{b{x)f{x;t)} 

2 dx 



Here F(x;t) is the distribution function 



(4.42) 



F(x; t) = J f(y ; i) dy. (4.43) 

o 



This formal derivation suggests that the right-hand side in (4.42) is the rate 
of flow of probability (from left to right) across the point x at time t. This 
interpretation can be verified, and we thus call the right-hand side in (4.42) 
the probability flux of the diffusion process. If a stationary distribution f(x) 
exists this probability flux will be zero if f(x] t) is replaced by f{x), so that 
the stationary distribution satisfies the equation 



-a(x)f(x) + \ 



d{b{x)f(x)} 

dx 



= 0 . 



dx 



(4.44) 
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Integration shows that the solution of this equation is 

X 

f(x) = const[6(z)] _1 exp(2 J a(y)/b(y)dy), (4.45) 

where the constant is allocated so that 

l 

J f(x) dx = 1. (4.46) 

o 

So far as the original Markov chain is concerned, our interpretation is that 
the diffusion approximation to the stationary probability that the random 
variable in the Markov chain lies in [Mx\, Mx 2 ] is given by 

ProbjMxi < X < Mx 2 ) « J f(x)dx. (4.47) 

Xl 

This approximation turns out to be satisfactory except when x\ ~ 0 or 
X 2 ~ 1, in which case special arguments, which we shall consider later, are 
needed. 



4.6 Conditional Processes 

In this section we consider diffusion processes where 0 and 1 are both ab- 
sorbing barriers. It is often of interest to single out those diffusions for which 
a nominated absorbing barrier is eventually reached, and we do this by the 
theory of conditional processes. For definiteness we assume the barrier in 
question is x = 1, although we shall also give some formulas applying when 
it is x = 0. 

Since there can be no stationary distribution for such conditional pro- 
cesses, and since also there is no interest in fixation probabilities, interest 
centers almost entirely on properties of the time until fixation. Regarding 
the diffusion as an approximation to a Markov chain, it is clear from (2.153) 
that the sojourn time function (4.22) and (4.23) should be replaced by 

f{x;p)=t{x-,p)P 1 (x)/P 1 (p). (4.48) 



This gives 



X 

t*(x-p) = 2P 0 (p)P 1 (x)[P 1 (b)b(x)^(x)r 1 J ipiy) dy, 0<x< p,(4.49) 

0 



t*(x;p) = 2P 1 (x)[b{x)ip(x)] 1 / 'ip{y)dy , p < x < 1 



i 



(4.50) 
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We consistently use the asterisk notation (*) to denote functions computed 
conditional on eventual absorption at x — 1 and, below, the double asterisk 
notation (**) when conditioning on eventual absorption at x = 0. Thus 
conditional on eventual absorption at x = 0, the sojourn time function is, 
by arguments parallel to those just given, 

X 

t**(x;p) = 2P 0 (p)[b{x)'4>(x)]~ 1 j ip{y)dy, 0 <x<p, ( 4 . 51 ) 

0 

1 

t**{x;p) = 2P 0 (x)P 1 (p)[P 0 {p)b{x)'ip(x)]~ 1 J if>(y)dy, p<x< 1 ( 4 . 52 ) 

X 

Equation (2.152) suggests an even stronger result than these, namely that 
the conditional density functions f*(x\p,t) and f**(x\p,t) of the diffusion 
variable at time t satisfy 

f*(x;P,t) = f{x;p,t)Pi(x)/Pi(p), ( 4 . 53 ) 

f**{x;p,t) = f{x;p,t)P 0 {x)/P 0 (p). ( 4 . 54 ) 

It is clear that (4.49)-(4.52) can be used immediately to find the con- 
ditional mean times before absorption. These were originally found by 
Kimura and Ohta (1969) by a method other than that just outlined. We 
now indicate a third way in which these conditional mean times can be 
derived, namely by finding the conditional process analogues to the Kol- 
mogorov equations (4.6) and (4.9). To do this we must find the conditional 
process drift and diffusion coefficients analogous to those defined by (4.3) 
and (4.4). Let A be the event that absorption eventually occurs at x — 1 
and p*(x -> x + Sx) be the conditional probability density, given A, of a 
transition from x to x + Sx in time St. Then 

p*(x — >> x + Sx) = p(x -» x + Sx and A) /Pvob(A) 

= p(x — > x + 5x)P\(x + 5x)/Pi(x) 

« p(x — » x + Sx)[ 1 + SxP[(x) / Pi(x)], 

where we use the dash notation (') to refer to differentiation with respect 
to x . Then the conditional process drift coefficient a*(x) is found from 

a*(x)i i = /(ix) p -(x^x + fa 

« J (Sx)p(x — > x + Sx)[l + (6x)P[(x) / Pi(x)\d(5x) 

= {a(x) + b(x)P[(x) / P 1 (x)}5t. 

Thus it follows that 



a*(x) = a(x) + b(x)P[(x)/Pi(x). 



(4.55) 
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It is found similarly that 



b*(x) = b(x). (4.56) 

In the case of the Wright-Fisher model, with no selection or mutation, 
so that a(x) = 0 , &(x) = x(l — x),Pi(x) = x, (4.55) gives 



a*(x) = 1 — x. (4-57) 

This value has already been used implicitly in (3.9). In the same model, 
when the condition is made that the allele of interest is eventually lost, 

a**(x) = -x. (4.58) 



The arguments leading to these formulas can be made more rigorous by 
suitable handling of small-order terms. The conditional density /*(x;p, t) 
now satisfies the forward equation 



df*{x;p , t) = d{a*(x)f*{x;p,t)} 1 d 2 {b*{x)f*(x;p,t)} 

dt dx 2 <9x 2 



and the backward equation 

df*(x\p,t) df*(x\p,t) 

= a (P) 



dt 



dp 



I 1 h*(-n\ d 2 f*( X ''P' t } 

+ W— 



(4.59) 



(4.60) 



Using (4.53), (4.55) and (4.56) it is easy to check that these are consistent 
with (4.6) and (4.9). The conditional mean absorption time may now be 
found by using a*(x) and 6*(x) in (4.40) and (4.41), and the resulting value 
agrees with that found from (4.49) and (4.50). This final approach is more 
general in that it uses the defining equations (4.59) and (4.60) and thus 
can be used to find higher moments of the conditional absorption time. We 
take this point up later when considering specific applications. 

Parallel calculations apply, with the obvious changes, to find the con- 
ditional density function /**(x;p, t) when the condition is made that the 
allele of interest is eventually lost from the population. 



4.7 Diffusion Theory 

As mentioned in Section 4.1, there exists a deep mathematical theory of 
diffusion processes. Expositions of this theory are given by Ito and McKean 
(1965), Freedman (1971) and Mandl (1968). In this section we consider 
those parts of the theory that are of use to us in genetic processes. Because 
the random variable of interest to us is the frequency of some allele, we 
consider only diffusion processes on the interval [0, 1]. 

The drift and diffusion functions a(x) and b(x) were introduced in (4.4) 
and (4.3). They may be used to define the important functions p(x) and 
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m(x), defined respectively by 

x y 

p(x) = J exp(— 2 J a(z)/b(z)dx)dy, 

c 

X y 

m(x) = 2 J {&(*,)} x exp(2 J a(z)/b(z) dz)dy, 



(4.61) 

(4.62) 



for some arbitrary constant c. Up to a linear transform, p(x) is identical 
to the fixation probability P\(x). A diffusion is said to be on its natural 
scale if p(x) = x, which, from (4.61), is equivalent to a(x) = 0. For any 
diffusion not on its natural scale it is possible to find a transformed random 
variable (indeed the transformation is x -» p(x)) that is, and this explains 
the intimate link between p(x) and P\(x). For this reason, p(x) is called 
the scale function of the diffusion process. We give a name to the function 
m(x) in a moment. 

The functions p(x) and m(x) are central to many properties of diffusion 
processes, and we now show how they can be used to elucidate boundary 
behavior. Let r be an arbitrary point in (0, 1) and s be one or other bound- 
ary point (that is s = 0 or s = 1). From p(x) and m(x) we compute the 
functions 



s 



u(s) — j m(x)dp(x), 

r 


(4.63) 


s 

v(s ) = f p(x)dm(x). 


(4.64) 



r 



The nature of the boundary 5 is exhibited as follows: 



u(s) 


«(*) 


boundary type 


accessible? 


absorbing? 


< oo 


< 00 


regular 


yes 


no 


< 00 


= 00 


exit 


yes 


yes 


= oo 


< 00 


entrance 


no 


no 


= 00 


= 00 


natural 


no 


yes 



(4.65) 



A boundary is accessible if there exists positive probability that it can be 
reached in finite time from a given interior point, and is absorbing if the 
process remains forever at the boundary if it should reach it. We later given 
genetic examples of all four of these various boundaries. The terminology of 
boundary type follows Feller (1954), and other terminologies are possible, 
for example that of Prohorov and Rozanov (1969). 
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We have shown above why the description “scale function” is appropriate 
to p(x). The definition of m(x) shows that for a process on its natural scale, 

^ = 2|Kl)| -,. ( 4 . 66 ) 

It is a standard result for a diffusion process with a(x) = 0, b(x) = 6, 
that the mean time for the random variable to reach c =b 5 from the value 
c is b M 2 , and is thus inversely proportional to the diffusion coefficient 
b. While in our process b(x) is not constant, it may be so regarded, to a 
sufficient approximation, in any small interval c±5. This shows that to this 
level of approximation, dm(x)/dx is proportional to the mean time that the 
diffusion process takes to leave this interval. This leads to the term speed 
measure for m(x), although since larger values of dm(x)/dx correspond to 
larger mean times for leaving the interval c ± 5, a better name for m(x) 
would perhaps be “inertia measure” . 

In our informal derivations in previous sections we have assumed without 
proof that density functions for diffusion processes exist. McKean (1956) 
has shown that a diffusion does have a density function f(x;p, £), where p 
is the initial value of the diffusion random variable. Specifically, if the value 
of the diffusion random variable at time t is denoted X(t), 

d 

Prob(c < X(t) < d) = J f(x\p,t)dx, (4.67) 

c 

for all p, c and d other than boundary points. When there are no natural 
boundaries, /(x;p, £) possesses (Elliott (1955)) the eigenfunction expansion 

oo 

f(x;p,t) = w(x)(C + 'Y^e Xnt 4> n {x)4> ri (p)), ( 4 . 68 ) 

n—1 

where w(x) = dm(x)/dx and 0 > Ai > A 2 > • • • are distinct eigenvalues. 
The constant C is zero unless there exists a stationary distribution, when 
it takes the value 



C = ( J w(x) dx) 
0 



(4.69) 



In this case, 



d 

w(x) dx , (4.70) 

thus defining the stationary distribution and confirming our more infor- 
mally derived (4.45). 



^lim Prob(c < X(t) < d) = C J 
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When there is no stationary distribution, (4.68) can be written as 

oo 

f(x;p,t) = ^2e Kt (w(x)4> n (x))4> n {p). (4.71) 

71=1 

We give a specific example of this expression in (5.11). 

The eigenfunctions (j> n (x) satisfy the normalizing property 

l 

J {<f>n(y)} 2 w(y) dy = 1, (4.72) 

0 

this equation holding whether or not there exists a stationary distribution. 
The function w(x)(j>i(x) is analogous to the left eigenvector corresponding 
to the leading nonunit eigenvalue of the transition matrix P (see Section 
4.2), while is analogous to the corresponding right eigenvector. 



4.8 Multi-dimensional Processes 



So far we have considered diffusion processes in one dimension only. In a 
number of cases in population genetics theory we are, however, required to 
consider a vector of random variables rather than a single variable, and this 
leads to the consideration of multi-dimensional diffusion processes. We now 
informally extend to the multivariate case some of the derivations given in 
Section 4.2. 

Consider first a set of linearly independent, jointly Markovian variables 
Xi, . . . , Xk for which, after a suitable re-scaling of time and space axes, 

E(&Cj) = a^(xi, . . . , Xk)St + o(5t), 
var (Sxi) — bi(x i, . . . , Xk)St + o(St), (4.73) 

covar (5xi, Sxj) = c^(xi, . . . , Xk)St + o(St), 

with higher absolute moments of order o(St). Let /(xi, . . . , Xk] t) be the 
joint density function of these random variables at time t. Then proceeding 
as in Section 4.2, this density function satisfies the forward equation 

1 , ■ • ■ , x k \ t) = - ^ — {diiXi, . . . , x k )f(x 1 , . . • , X k -, t)} 



d 2 



+ faai b i( x i’ ■ ■ ■ ’ x k)f{xi, ■ • ■ ,x k -,t)} (4.74) 



+ EE 



0 2 



i<j 



dxidxj 



{Cij(xx x k )f(xi ,...,x k ;t)}. 



There will also exist a backward equation of obvious form corresponding 
to (4.9). Suppose now a joint stationary density / = /(x i, . . . , x*,) exists. 
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Then from (4.74) this density function will satisfy the equation 

-E £-(«./( + j E 53 <v> + EE - «• <« 5 > 

i i 1 i<j J 

Unfortunately the concept of a probability flux in several dimensions is 
more complex than in one dimension and, perhaps as a consequence, no 
simple explicit formula is known for the stationary distribution generalizing 
(4.45). 

A most important question in multi-dimensional diffusion processes 
concerns the possible existence of a “second-order diffusion” or “quasi- 
Mar kovian variable.” We illustrate this concept by an example. In the 
two-sex model of Section 3.7 the pair (x t , t/t), where x t (y t ) is the frequency 
of A\ among males (females) in generation t, is jointly Markovian. One sus- 
pects that x t and yt will seldom differ significantly from each other and that 
some weighted average of the two would behave in a “quasi-Markovian” 
manner. Such a possibility was investigated by Moran (1958) and Watter- 
son (1962); we present here an outline of the definitive work of Norman 
(1975c) on this point, simplified to cover specifically genetical applications. 

Consider a population of size N reproducing at time points n = 0, 1, 2, 
3, . . ., and suppose there exists at time n a random variable X n (0 < X n < 
1) having the properties 

E{AT n+ i — X n } = Tjsfa(X n ) + ei n , 

E{X n+1 - X n } 2 = r N b(X n ) + e£ n , (4.76) 

E{\X n+1 - X n \} 3 = e» n . 

Here all expectations are conditional on Xq, Xi, . . . , X n , rjy > 0 and tn — > 
0 as N — y oc, and the “error” terms ef^ n are all o(tjv) in the sense that for 
any finite £, 

E{|e"|}^0 as TV — > oo. (4.77) 

n<[t/r N ] 

The conditions (4.76) are reminiscent of the conditions (4.3)-(4.5), al- 
though we emphasize that X n is not necessarily a Markovian variable. 
In the two-sex model just mentioned, for example, the quantity we use 
later for X n (see Section 5.2) is not Markovian. A function X n satisfying 

(4.76) does not necessarily exist, and if it does it is not necessarily unique: 
There may be several “quasi-Markovian” variables satisfying conditions like 

(4.76) . We expect that under certain reasonable conditions the behavior of 
X n will mimic that of a diffusion variable, and make this expectation more 
precise by specializing to the genetic case a general theorem of Norman 
(1975c). 

Theorem 4.1. Suppose in (4.76) that a(x) and b(x) are polynomials with 
a(0) > 0, a( 1) < 0, 6(0) = 6(1) — 0 and b(x) > 0, 0 < x < 1. Define X(t) as 
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a diffusion variable having initial value X (0) = Xq and drift and diffusion 
coefficients a(x), b(x) respectively. Then for any time points 0 < t\ < £2 < 
•■• < the joint distribution of X niJ X n 2 , . . . , X n j converges to that of 
X(ti), Xfo), • • • jX(tj) as N — » oo 5 rii — > 00 and n*T/v U. 

We do not prove this remarkable theorem here and note only the sim- 
plicity of the conditions for its applicability. In particular, as we see in the 
next chapter, the two-sex model of Chapter 3 satisfies these conditions, and 
this will lead to a definition of a variance effective population size for this 
model. 



4.9 Time Reversibility 

In this section discuss, informally and briefly, the extent to which a diffu- 
sion process is time reversible in the sense outlined for Markov chains in 
Chapters 2 and 3. We recall the definition (2.163) of time reversibility for 
Markov chains, and observe that this implies that 

fcPif = (t>jP { ji ( 4 -78) 

for any positive integer t. 

Consider now a diffusion process on [0, 1] possessing a stationary dis- 
tribution /(x), given by (4.45). This diffusion process is time-reversible. 
We have noted that for Markov chains certain questions involving time re- 
versibility can be considered even though no stationary distribution exists. 
Various devices enable us to do the same thing for diffusion processes. In 
this book we shall consider only a useful general procedure, due to Nor- 
man (1978), which we take up in more detail in the next chapter when 
considering genetical examples. It is possible that one boundary of the dif- 
fusion process is absorbing and thus that no stationary distribution exists. 
When this is so the reversibility argument cannot be applied directly. On 
the other hand, it is sometimes possible to alter the diffusion process by 
inserting a small parameter e such that the original process corresponds 
to e = 0 and, when e > 0, a stationary distribution does exist. Thus for 
e > 0 the time reversibility argument holds and if we now let e — > 0 it is 
sometimes possible to derive meaningful results for the original process by 
continuity. A specific example of this is given in Section 5.9. 



4.10 Expectations of Functions of Diffusion 
Variables 

In (4.27) we found the expected values of the integral of a function g(x) 
over the entire time taken before absorption has been reached. In some 
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cases it is of interest to find the expected value at a single time point £, 
that is 

h(t) = E t [^(x)] = [ g(x)f(x;p,t) dx, (4.79) 

Jo 

where f(x;p,t) is the density function of the diffusion random variable at 
time t and p its initial frequency. This expectation can be used in a variety 
of ways and has been exploited with particular success by Ohta and Kimura 
(1969a, 1971a): See also Kimura and Ohta (1971, pp. 183-190). Suppose 
at time t + 5t that the random variable takes the value x + 6x. Then 

h(t + St) = E(<j(x + &r)) 

= EtE* (g(x + 5x)) (4.80) 

where E x refers to the expectation operator conditional on the observed 
value x at time t and E t refers to expectation with respect to the 
distribution of x. Now 



E [g(x + &r)) ~ g(x) + E (Sx)g'(x) + ^E (Sx) 2 g ,f (x). 
Inserting these values in (4.80) we get 

h(t + St) ~ h(t) + E t [(a(x)g f (x) + \b{x)g tf (x))5t 



and hence 

= E t (a(x)g'{x) + \b(x)g"(x)). (4.81) 

If the diffusion process admits a stationary distribution, the limiting case 
£ — > oo in (4.81) yields 

E[a(x)5'(x) + \b{x)g"{x)] = 0. (4.82) 

A generalization of these equations is possible for multi- dimensional dif- 
fusions. If the diffusion process involves linearly independent variables aq, 
aq, . . . , aq, and if h(t) is the expected value of some function g(x i, . . . , Xk) 
at time t, then in an obvious notation 



j t h{t) = Et(5^ai(xi, . . . + 5 J2bi(x u . . . + 



dx. 



+E 



(4.83) 



If a stationary distribution exists then at stationarity 
E 

’ ox, 

8 2 g 



Y,a i (x 1 ,...,x k )^ + ±Y, b i( x 



+ EE c ij {x 1 ,...,x k ) 



dxidxj 



= 0 . 



(4.84) 
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We give examples later (see in particular Section 6.6) of the use of these 
formulas. 




5 

Applications of Diffusion Theory 



5.1 Introduction 

In this chapter we apply some of the diffusion theory considered in the 
previous chapter to various Markov chain models arising in population 
genetics in order to arrive at various conclusions of evolutionary interest. 

Our first aim is to see how the behavior of a given Markov chain can be 
approximated by a diffusion process on [0, 1]. To do this it is convenient to 
start with the general Wright-Fisher model specified by (3.16) and (3.29). 
In this model the variable considered is the number j of A\ genes at some 
locus A in a diploid population of fixed size N, and thus has state space 
{0, 1, 2, . . . , 2N}. To work with a variable whose state space is closer to 
that of the diffusion process, we consider instead the fraction x of A\ genes 
in the population, whose state space is {0, (2A r ) _1 ,...,l}. We assume the 
notation x for the frequency of A\ throughout this chapter except in Section 
5.10, where more complex expressions are required. We also write p for the 
initial frequency of A \ . 

So far as other notation is concerned it is convenient to adopt the notation 
given in (1.25), so that the genotype fitnesses are denoted by 



Wn = 1 + 5, wi 2 = l + sh, w 2 2 = 1. (5.1) 



Further, when mutation exists, we assume mutation rates u(A\ — > A 2 ) and 
v(A 2 -» Ai). 
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The diffusion model we concentrate on requires that s, u and v are all 
We make this assumption throughout, and then put 

a = 2 Ns, Pi = 2 Nu, p 2 = 2 Nv (5.2) 

where a, /?i, and (3 2 are all 0(1). Then standard binomial formulas for the 
model (3.16) show that 

E(fe | x) — (ax{l — x){x + h( 1 — 2x)} — (3\x + j3 2 {\ — x))(2N )~ 1 



+ o(N~ 1 ), 

var (6x | x) — x{l — x)(2N )~ 1 + o(N~ 1 ), (5.3) 

E{|fe| 3 } = oiN- 1 ). 

These moments fit into the format (4.3) - (4.5) provided that we choose 

5t= (27V) -1 , (5.4) 

b(x) = x(l - x), (5.5) 

a(x) — ax(l — x){x + h( 1 — 2x)} — (3\x + fi 2 (l — x). (5.6) 



The requirement (5.4) is met by taking unit time in the diffusion process 
to correspond to 2N generations in the Markov chain. It is important to 
keep this scaling in mind when considering the relation between “time” 
properties in the diffusion process and those in the Markov chain. We now 
consider some properties of the diffusion process on [0, 1] with drift and 
diffusion coefficients given respectively by (5.6) and (5.5). 

Before proceeding we observe that in practical applications the idealized 
model (3.16) will probably have to be replaced by something more complex, 
perhaps one or other of the models discussed in Chapter 3 in connection 
with effective population sizes. At the end of the next section we pursue 
this point for one particular such complex model. Although the theory is by 
no means clear, it seems likely that all the diffusion results given below will 
continue to hold, at least to a good approximation, when N is replaced by 
the variance effective population size . Except for the case considered 
at the end of the next section, we make no further explicit mention of this 
point in this chapter. 

The first step in discussing properties of the diffusion process with the 
drift and diffusion coefficients (5.5) and (5.6) is to compute the scale 
function and speed measure of the process, defined by (4.61) and (4.62). 
These become 



X 

p(x) = J y~ W2 (l - y)~ 201 exp{a(2/i - 1 )y 2 - 2 ahy} dy, (5.7) 

C 

X 

m(x) = 2 J - y) 2(3l ~ l exp{— a(2h - l)y 2 + 2ahy} dy, (5.8) 
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for an arbitrary constant c. We first use these expressions to consider 
boundary behavior. Use of (5.7) and (5.8) in (4.63) and (4.64) shows that 
near x = 1, the functions u(x) and v(x) take the form (for Pi ^ ^) 

u(x) = A + 0(1 — x) l ~ 2 ^ 1 , v(x) = B -F 0(1 — x ) 2 ^ 1 . 

Here A and B are constants whose precise values are unimportant. It follows 
that v(x) is always finite at x — 1 , but that u(x) is finite at this point only 
if < From this we conclude that the boundary x — 1 is regular 
(accessible but nonabsorbing) if Pi < \ and entrance (inaccessible and 
nonabsorbing) if Pi > The same conclusion holds for the boundary 
x — 0, with p 2 replacing P\. The values of a and h are irrelevant to these 
boundary descriptions. The case Pi — \ is easily handled separately. 

The intuitive meaning of these conclusions is clear enough. If the muta- 
tion rate from A\ to A 2 and the population size are jointly large enough 
there is zero probability that the frequency of Ai can ever achieve the 
value unity. Of course this conclusion applies for the diffusion process and 
it not true for the Markov chain (3.16), a fact we shall take up again later 
when considering the accuracy of diffusion approximations, particularly at 
a boundary. 

If /?i = 0 the boundary x — 1 is found to be exit (accessible and absorb- 
ing), and this again accords with what we expect since, if the boundary is 
reached, the absence of mutation from Ai to A 2 means that the frequency 
of Ai remains forever at unity. The fact that the boundary is accessible is 
less obvious intuitively: In Section 5.8 natural boundaries will be encoun- 
tered which, although absorbing, are not accessible, that is for which there 
is zero probability that they are reached by diffusion from within (0, 1). 

The functions p(x) and m(x) are also central to the calculation of fixa- 
tion probabilities and stationary distributions respectively, when these are 
appropriate. We defer consideration of these until we take up specific cases 
later. 

We conclude this section by emphasizing that our main interest is in 
Markov chain models such as (3.16), and we view diffusion processes mainly 
as approximations to these. Usually the approximations are excellent, but 
in some instances, particularly near the boundaries x = 0, x = 1 they are 
less so, and for these cases some care is needed in proceeding. We take up 
this matter in more detail in Section 5.7. 



5.2 No Selection or Mutation 

When there is no selection or mutation the model defined by (3.16) and 
(3.29) reduces to (1.48). Rather complete knowledge of the diffusion ap- 
proximation to this model is available, and in this section we explore this 
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in some detail. Clearly we have 

a(x) = 0, b(x) = x(l — x), 
and the forward equation becomes 
df(x;t) _ ! d 2 



dt 






(5.9) 



(5.10) 



The solution of this equation, and others more complex, was achieved in 
a series of papers by Kimura (1955a, b, c, 1956, 1957), which heralded 
a rebirth of the mathematical theory of population genetics. Most of the 
results in this section were given in these papers. The explicit solution of 
(5.10), subject to the requirement x = p when t = 0, is 

/ fep .0 = £ 4 ( a t ( i 1 f 1 1 )~ P> ^-.( 1 - - 2 x ) 

x exp { — + l)t} . (5.11) 

Here T- 1 _ 1 (x) is a Gegenbauer polynomial defined in terms of the 
hypergeometric function by 

Tl_ x {x) = ±i(i + l)F(i + 2,l-i,2,±(l-x)), 

so that in particular 

Tq(x) = 1, T\{x) = 3x. (5.12) 

The speed measure m(x) for the coefficients (5.9) is such that 

w(x) — dm(x)/dx = 2x _1 (l — x) -1 . (5.13) 

We use this to confirm that (5.11) is of the form defined by (4.67) and 
(4.68) with 

My) = 2(2 * + 1 ) 1/2 {i(i + l)} -1/2 y(l - y)Tti(l - 2 y). (5.14) 

The probabilities Po(t) and P\(t) that the diffusion has reached 0 or 1 
respectively by time t are 

oc 

Po(t) = 1 -P + + l)p(l - p)(-l)‘F(l - i, i + 2, 2, 1 - p) 

i—1 

x exp (-|j(i + l)l) , (5.15) 

OO 

Pi(t) - P + £(2t + l)p(l - P)(-1YF(1 -i,i + 2, 2,p) 



i= 1 



x exp + l)t) . 



(5.16) 



The probability of ultimate fixation at x = 1 can be found by letting t -* oo 
in (5.16) or else by computing (4.17), with ^(x) defined by (4.16) and (5.9). 
Evidently \(j(x) = l and hence 



Prob(ultimate fixation at x — 1) = p. 



(5.17) 
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The mean fixation time can be found from (4.22) and (4.23). These 
equations give 



t(x;p) = 2(1 -p)/(l -x), 0 <x<p, 

i(x;p) = 2 p/x, P < x < 1, 

so that the mean absorption time is 

i(p) = -2{plogp + (1 - p) log(l - p)} 



(5.18) 



(5.19) 



time units, or — 47V{plogp + (1 — p) log(l — p)} generations. This agrees 
with the value (3.5) found without recourse to diffusion processes, and 
yields (3.6) and (3.7) as cases of particular interest. 

The variance of the absorption time can be found from (4.31) and is 



where 



l p 

4 (p j X(x)dx — (l—p) J X (x) dx) — i(p) 2 , 

p o 



x 

\(x) = -2 J [(1 -y) _1 log?/ + y _1 log(l - y)]dy. 



(5.20) 



(5.21) 



The value (5.20) is in terms of (squared) time units and must be multiplied 
by 4 N 2 to be brought to a (squared) generation basis. 

The complete distribution of the absorption time is implicit in (5.15) and 
(5.16), since 



Prob{ absorption time < t} = Po(t) + (5.22) 



Because of the form of the solutions (5.15) and (5.16), this expression is of 
most use when t is large. We show later how this solution may be supple- 
mented by an asymptotic expansion the accuracy of which is best for small 
values of t. This asymptotic expansion, together with (5.22), then yields a 
rather complete picture of the distribution of the absorption time. 

What do these diffusion results mean for the Markov chain model (1.48)? 
The fixation probability (5.17) is exactly correct for this model, since we 
have seen that this value can be reached directly. The mean absorption time 
approximation has been confirmed. We have, however, arrived at the more 
detailed information, from (5.18) and (4.26), that if the initial number of 
Ai genes in the Markov chain model is fc, the mean number of generations 
for which this number assumes the value y, before reaching 0 or 2 TV, is 
approximately 

hj = 2(2 N - k)/(2N - j), j < fc, , x 

(5.23) 

tk,j = 2fc/ j, j > k. 

The particular case k = 1, of particular interest to Fisher and Wright, gives 
iij = 2 j -1 , in agreement with (1.56). 
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We turn now to the spectral expansion (5.11). Recalling the difference 
in time scale between the Markov chain (1.48) and the diffusion process 
(5.10), it is clear that the eigenvalue exp{ — + 1)} is the analogue of 

the Markov chain value 





exp — { 1 + 2 H hi} 



= exp + 1)}. 



There is also a parallel between the eigenfunctions in (5.11) and the 
eigenvectors of (1.48). For large t we may write 



f(x;p,t) = 6p(l — p) exp(— i) + 30p(l -p)(l - 2p)(l - 2x) 

x exp(-3t) H . (5.24) 

The function p( 1 — p) in the leading term of the expression on the right- 
hand side is clearly the analogue of the right eigenvector (1.51). Since this 
leading term is independent of x, the analogue of the corresponding left 
eigenvector is Ik — constant. This shows that the asymptotic (t — > oc) 
conditional (x 7^ 0, 1) distribution of x is uniform over (0,1), in agreement 
with the approximation of the model Wright-Fisher model (1.48) noted 
after equation (1.54). The complete expansion (5.24) shows, as was first 
observed by Kimura, that the extensive attention paid to this distribution 
was misplaced. The leading term in (5.24) does not dominate the second 
term until t«2, that is 4iV generations, and the distribution of the fixation 
time, given by (5.22), shows that fixation of one or other allele is likely to 
have occurred by this time, especially in the interesting case p = (2N)~ X . 

We have seen that the eigenfunction solution to equations such as (5.11) 
appear in the form 

f{x;p,t) = £{<Mp)H0«(xM*)}exp(A n t). (5.25) 

71 

Thus the eigenfunctions corresponding to initial and current points bear 
a simple relationship to each other. Now in the model (1.48) it is quite 
easy to find the right eigenvectors exactly, but very difficult to find the left 
eigenvectors, and (5.25) suggests an approximation relation between the 
two. Since for the process (5.10) 

w(x) — 2x _1 (l - x) _1 , (5.26) 

we may make the approximation for the Wright-Fisher model (1.48) 

e ij =j~ 1 {2N-j)- 1 r ij , (5.27) 

where is the j th element in the ith left (right) eigenvector of the 

transition matrix. Since we know V 2 j = j{2N — j), this derives the uni- 
form approximation to the leading left eigenvector, and allows a rapid 
approximation to the remaining eigenvectors. 
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Although the solution (5.11) is exact for all t , it is most useful for large 
values of t (say t > 1). For small values of £, for example for t < 0.1, 
the infinite series converges slowly, and many terms must be calculated to 
obtain satisfactory approximations. Fortunately, an asymptotic expansion 
solution of equation (5.10) dovetails nicely with the solution (6.22) near 
t = 1 , and the two solutions together then allow rather complete knowledge 
of the solution of (5.10). We do not give the derivation of this asymptotic 
expansion, which is due to Voronka and Keller (1975). A wider range of 
applications is given by Tier and Keller, (1978). Here we observe only that 
for t < 1, 

Pi(p;t) ~ {p{\ — p)/C} 1/4 exp(— 2Cf -1 ), (5.28) 

where 

C = |{arcos(2p - l)} 2 . 

Clearly, by symmetry, 

Po{p; t) ~ {p(l - p)/c '} 1/4 exp(-2 C'f -1 ) (5.29) 

where 



C f = |{arcos(l - 2 p)} 2 . 

These two values can be combined to give an asymptotic expansion for 
the probability of fixation by time t. When p = t = 0.65, (5.28) gives 
Pi t) « 0.119, whereas the correct value, found after much computation 
from (5.16), is 0.117. For t = 1, (5.28) gives Pi(|,t) « 0.232 whereas the 
approximation 

found by taking the two leading terms only in (5.16), gives the value 0.224. 
(The correct value is 0.223.) Remarks parallel to these apply for the density 
function f(x;p,t)\ The asymptotic expansion is very accurate up to t = 
1, where it agrees with the expression found by taking the three leading 
terms in (5.11). The latter then provide excellent approximations for large 
t values. 

We consider now processes conditional on the event that a specified 
boundary is eventually reached. We suppose for definiteness that x = 1 is 
the absorbing state ultimately reached. Equations (4.53) and (5.11) show 
that the density function of x at time t is 



f*(x-,p,t) = f; 4(2 '7- + |) P) T-i(l - 2p)7t 1 (l - 2x) 
2—1 ^ ^ 
x exp { — \i{i + 1)2} . 

For large t and small p this gives 



(5.30) 



f*(x;p,t) ~ 6a; exp (-t), 



(5.31) 
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so that 



lim f(x | x ^ 0, 1, eventual fixation at x = 1) = 2x. (5.32) 



It is interesting to compare this with the expression in (1.86). There is no 
similarity between the two expressions, and we conclude that this is a case 
where the nature of the branching process on which (1.86) is based, and 
that of the model (1.48), are sufficiently different so that one gives little 
information about the other for large t values. 

The functions t*(x) defined in (4.49) and (4.50) become 



t*(x) = 2(1 — p)x/{p( 1 — x)}, 0 < x < p, 

t*(x) = 2, p < x < 1. 



(5.33) 



The conditional mean absorption time, found by integration, is then 

t*(p) = -2p _1 (l -p)log(l -p). (5.34) 

In the Markov chain (1.48) this suggests the approximation that if k is the 
initial value of the Markov variable, 



t' kd *2(2N-k)j/k(2N-j), j<k , 
t* kij « 2, j> fc, 



(5.35) 



and a conditional mean fixation time of — 47Vp _1 (l — p) log(l — p) genera- 
tions. One interesting case arises when k — 1, so that there is initially only 
one gene of the allele of interest. One case of this concerns a unique selec- 
tively neutral new mutant destined for fixation. Equation (5.35) shows that 
on average, this allele spends two generations at each possible frequency 
value, (k = 1 , 2 ,..., 2iV — 1, so that if t* is the conditional mean fixation 
time, 



t\ = 47V - 2 (5.36) 

generations. It is instructive to see how easily information about the condi- 
tional process can be found from information concerning the unconditional 
process. 

The conditional variance of the absorption time can be found by solving 
(4.30), subject to appropriate boundary conditions. Here we must use the 
conditional process drift coefficient 



a*(x) = 1 — x 



rather than the unconditional value. It is found that 
47T^ 

(a*) {p) = — +8p _1 (l -p)log(l -p){l - (2p) -1 (l — p) log(l — p)} 




( 5 . 37 ) 
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In the limiting case p — » 0 this gives 

(O 2 ~ 8 (y “ L5 

or, for the process (1.48), 4.647V 2 (squared generations). The complete dis- 
tribution of the conditional absorption time can be found immediately from 
(5.16). We have 

Prob{ absorption at x — 1 before time t | eventual absorption at x = 1} 

Prob { absorption at x — 1 before time 7} 

= — (5.39) 

Probj eventual absorption at x = 1} 

CO 

= 1 + ^(27 + 1)(1 - p)F(i + 2, 1 - i, 2,p)(-l) 1 exp{-i(i + 1)7}. 

i=l 

The expressions (5.34) and (5.38) can in principle be found from this dis- 
tribution, but it is far simpler to arrive at them in the manner we have 
shown. Kimura (1970) established (5.39) and discussed the nature of the 
corresponding density function for various p values. 

The asymptotic expansion of Voronka and Keller may be used immedi- 
ately for conditional processes. Thus, for example, (5.28) gives 

P*(p-,t) = p~ 1 {p( 1 -p)C} 1/4 exp(— 2C'i~ 1 ), (5.40) 

where P*(p; t) is the conditional probability of fixation at x = 1 by time 7, 
given fixation eventually occurs. Similarly the conditional density function 
f*(x;p,t) can be accurately approximated by small 7, and this leads, as 
with f(x;p, 7), to rather complete knowledge of its nature. 

We consider briefly the case where we condition on eventual loss of the 
allele A\. Since for this case a(x) — 0 ,b(x) = x{\ — x) and Pq(x) — 1 — £, 
the analogue of (4.55) gives a** = —x. The analogue of (5.34) is, in terms of 
generations, t** = —AN(plogp)/(l—p). This is identical to the value given 
in (3.22), and this is not surprising since the value of the drift coefficient 
a** given above is identical to that given in (5.61) below for the one-way 
mutation case when 9 = 2. Since the diffusion coefficients are also the same 
in the two cases, the entire stochastic behavior of the conditional process 
without mutation and the unconditional process with mutation (for 9 — 2) 
are identical. This seems for the moment to be no more than a curiosity, 
but it will have a more interesting interpretation when considering age and 
retrospective properties in in Chapter 9. 

We conclude this section by discussing a model generalizing (1.48). Con- 
sider for example the two-sex model introduced in Section 3.7. Using the 
notation of that section, the quantities k(t) (the number of A\ genes among 
the males in generation t) and £(t) (the number of A\ genes among the fe- 
males in generation t), are jointly Markovian. We make progress by using 
the Norman theory of quasi-Mar kovian variables introduced in Section 4.8. 
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The weighted average gene frequency x(t) is 

x{t) = /c(t)(4iV 1 ) — 1 + {(t)^)" 1 (5.41) 

and it is easy to check, for the model considered, that 
E {x(t -f 1) — x(t) | x(t)} = 0 

var {x(t + 1) — x(t) \ x(t) — x} — x(l — x)(2N e )~ 1 + e 2j t, (5.42) 
E{|a:(t+ 1) -x(f)| 3 } = e 3 , t) 

where N e = 4NiN 2 /(Ni + iV 2 ) and, for large Ni and iV 2 , the error terms 
e^t can be shown to satisfy (4.77) with tjv — ( 2N e )~ l . Thus x(t) is a 
quasi-Mar kovian variable, and the conclusions given in Section 4.8 for such 
variables can be applied. In particular the probability of fixation of A\ is 

fc(0)(4AT 1 )- 1 +^(0)(4JV 2 )- 1 . 

The variance formula confirms the value (3.124) for the variance effective 
population size. 



5.3 Selection 

Suppose now that the three genotypes have fitnesses given by (5.1). If there 
is no mutation, the drift coefficient (5.6) becomes 

a(x) = ax{\ — x){x 4- h( 1 — 2x)}, (5.43) 

From this the scale function and speed measure are calculated as 

X 

p{x) = J ip(y) dy, ( 5 . 44 ) 

Cl 

X 

m(x) = 2 J y^i 1 -yy'&iv)}^ dy, ( 5 . 45 ) 

C2 

where 

^(y) = expa{(2/i — 1 )y 2 — 2%}. (5.46) 

Both boundaries x = 0, x = 1 are exit, and the probability that one or the 
other boundary is eventually reached is unity. The respective probabilities 
are given by (4.15) and (4.17), with defined by (5.46). 

These expressions simplify significantly only in the case of no dominance 
(h = ^), for which 

Pi(p) = {1 - exp(— ap)}/{l - exp(-a)}. (5.47) 

This agrees with the approximation (3.31) found without using diffusion 
methods. Some numerical values calculated from (5.47) are given in Table 

5.1. 
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p = 0.001 






II 

O 






N = 10 4 


N = 10 5 


N = 10 6 


IV = 10 4 


N — 10 5 


N = 10 6 


0.01 


0.181 


0.865 


1.000 


1.000 


1.000 


1.000 


0.001 


0.020 


0.181 


0.865 


1.000 


1.000 


1.000 


s 0.0001 


0.002 


0.020 


0.181 


0.731 


1.000 


1.000 


0.00001 


0.001 


0.002 


0.020 


0.525 


0.731 


1.000 



Table 5.1. Values of Pi(p), for various values of iV, s, and p, calculated from 
(5.47) 

The conclusions to be drawn from this table are straightforward. When 
iV, s and p are jointly sufficiently large, fixation of the favored allele is 
essentially certain: This occurs approximately when Nsp > 5. As TV, s or 
p decreases, the fixation probability decreases, and if Ns <0.1 it does not 
differ (relatively) by more than 10% from the neutral value p. Perhaps the 
most striking conclusion is the very strong effect of selection in influenc- 
ing fixation probabilities: As noted below (3.31), selective differences far 
too small to be found in the laboratory can nevertheless have a decisive 
effect on evolutionary behavior, at least in populations that are not too 
small. The same conclusion holds, at least qualitatively, when there is no 
dominance (that is h % ^), although some minor modifications to the nu- 
merical values are necessary, especially when dominance is complete (h — 0 
or h = 1). Even in the overdominance case (sh > s > 0) fixation of one 
or the other allele is certain although, as we see later, this will normally 
take an extremely long time, and in practical terms one must then question 
the appropriateness of the assumptions made, in particular that there is no 
mutation and that the population size, selective differences, and dominance 
relationship remain unchanged throughout the entire fixation process. So 
far as fluctuations in population size are concerned, it seems likely, for any 
fixed selection scheme, that (5.47) and (4.17) still apply if N is replaced by 
the variance effective population size, although the theory for this has not 
been verified. 

The complete solution of the forward equation (4.6), with a(x) and b(x) 
defined by (5.43) and (5.5), is very complex. Nevertheless solutions were 
found by Kimura (1955a, b, c; 1957) initially for the no dominance case and 
subsequently for the general case. Unfortunately the very complexity of the 
solutions makes examination of their implications difficult, although this 
does not detract from the influence that the derivation of these solutions 
has had on population genetics theory. For more details concerning these 
solutions, see Crow and Kimura (1970, pp. 396-414). 
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5.4 Selection: Absorption Time Properties 

Despite the very complex form of f(x;t) referred to at the end of the 
previous section, a rather simple expression exists for the function t{x\p ), 
defined in (4.25), and since this function summarizes perhaps the most 
important features of the transient behavior of the process, we now compute 
it for the selective model we are considering. All that is required to do this 
is to substitute (5.43) and (5.5) into the general formulas (4.22) and (4.23). 
For h = \ we get 

t(x;p) = 2Po(p){ax(l — x)} _1 {exp(ax) — 1}, 0 < x < p 

i(x;p) = 2Pi(p){ax(l — x)} _ 1 {1 — exp(— a(l — x))}, p < x < 1 

(5.48) 

where P\(p) is found from (5.47) and Po(p) = 1 — P\(p). For the Markov 
chain defined by (3.16) and (3.29), this implies that the mean number of 
generations for which there are j = 2Nx A\ alleles, given an initial number 
k = 2Np, is approximately 



hj = 2{exp(-2p) - exp(-a)}{exp(ax) - 1} 

x [ax{\ - z){l - exp(-a )}]' 1 (j < k), 4Q 

tkj = 2{1 — exp(— ap)}{l — exp — a(l — x)} 

x [ax(l — x){l - exp(— a)}] -1 (j > k). 

For k — 1 this gives (1.62) if we make the approximation 1 — exp(— a/2N) = 
a/2N . 

The mean time for fixation is found jointly from (4.21) and (5.48), but 
unfortunately no explicit evaluation of the integrals is possible, and nu- 
merical computation is necessary. There is, however, one case where useful 
progress can be made. If a and p are jointly sufficiently large so that fixation 
of the favored allele can be taken as being almost certain, we get 

i{x\p) ~ 0, 0 < x < p 

t(x;p) ~ 2{ax(l — x)} -1 , p < x < 1 — 4a -1 , 

t(x;p) ~ 2{ax(l — x)} _ 1 {1 — exp — a(l — x)}, 1 — 4a -1 < p < 1. 

(5.50) 

The first equation shows that in the case considered, the frequency of the 
favored allele spends negligible time less than its initial value. The second 
equation is perhaps the most interesting. Converting to generations, it im- 
plies that in the Markov chain the mean time spent in the frequency range 
{x \ , X 2 ) where p < x 1 < X 2 < 1 — 4 a -1 , is approximately 




— x 



)} 1 dx 



X\ 
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generations. This is identical to the value (1.29) found for the corresponding 
deterministic process, and we can conclude that the behavior of the process 
(p, 1 — 4a _1 ) is “quasi-deterministic” . When the frequency exceeds 1— 4a -1 
the deterministic value no longer gives an adequate guide to the stochastic 
behavior. In particular, the mean number of generations (in the Markov 
chain) for which x — 1 — i(27V) -1 , for small integers i, is essentially equal 
to the “neutral” value 2. This is severely overestimated by the deterministic 
formula, and clearly at this stage of the process selective forces have become 
of secondary importance, and random sampling almost wholly determines 
the gene frequency behavior. 

For general values of h in (0, 1), the expressions (4.22) and (4.23) do 
not simplify readily. However, the general behavior just noted for the no 
dominance case continues to apply. Quasi-deterministic behavior obtains 
for sufficiently large p and a, at least until the frequency x of A\ approaches 
unity, when selective forces once more can be ignored. The value of x where 
this occurs will depend to some extent on the level of dominance but will 
not differ materially from the value 1 — 4a _1 found in the no dominance 
case. 

The value k — 1 is of particular interest. Here we may approximate 

i 

the probability Pi {(27V) by (27V) 1 (/ ^(y) dy ) and this leads, in the 

o 

Markov chain, to the approximation (1.60). 

The variance of the absorption time is given in principle by (4.31), but 
evaluation of this will certainly require numerical methods. 

The overdominance case presents features of special interest. Here the 
deterministic theory gives a stable polymorphism while the stochastic the- 
ory predicts eventual loss of one or other allele. On the other hand, it is 
plausible, in the stochastic case, that extremely long periods of time are 
spent near the quasi-equilibrium point, at least when selection is strong, so 
that in some sense the deterministic theory provides a useful guide to the 
stochastic behavior. (Of course, if mutation is allowed, an entirely different 
stochastic behavior arises and one which should be well described by the 
deterministic theory.) We now discuss the stochastic process in more detail, 
and show in particular that the plausibility argument given above does not 
necessarily apply. 

We start from the observation just made, that in the stochastic process 
fixation of one or other allele is eventually certain if there is no mutation, 
and ask how much time fixation takes to occur. This is best answered by 
considering the mean fixation time or, more crudely, the leading eigenvalue 
in the spectral expansion of the density function f(x]t ). The latter ap- 
proach was taken by Miller (1962) and Robertson (1962) and produced a 
surprising answer. Define the quasi-equilibrium point by x * (see (1.31): We 
now assume the notation (1.25c)). If x* is close to 0 or 1 it is possible that 
this leading eigenvalue Ai, in an expansion of the form (4.68), is larger in 
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absolute value than for the selectively neutral case. This suggests a more 
rapid fixation process under overdominance than under neutrality. This 
perhaps surprising conclusion is explained by noting that if x* is close to 0 
(or 1), selection tends to drive the frequency of Ai close to 0 (or 1) compar- 
atively rapidly, and then random sampling effects which, as we have noted, 
play a predominant role near the boundaries, lead to loss or fixation of A\. 
The magnitude of this effect can be measured by taking the ratio of the 
absolute values of the leading eigenvalue to its neutral counterpart. When 
x* is close to 0 or 1, this ratio increases with s for values of s of order a few 
percent, although for very large s values the ratio ultimately decreases as s 
increases. The discussion just given shows why this behavior should occur. 
For intermediate values of x* (approximately 0.2 < x* < 0.8) the ratio 
always decreases as s increases, so that here heterosis always slows down 
the fixation process. It is clear that for these values of x*, selection does 
not provide a thrust towards the boundaries sufficient to speed up fixation. 
Tables and graphs illustrating this behavior are given by Robertson (1962); 
Robertson centered attention on retardation behavior and thus considers 
the reciprocal of the ratio defined above. 

A more complete analysis is provided by considering the mean fixation 
time, although a further degree of complexity arises here since this depends, 
unlike the eigenvalues, on the initial frequency pofAi. It is perhaps natural 
to pay special attention to the case p = x*, although a general analysis is 
quite straightforward. The mean fixation time is given by (4.22), (4.23), 
(4.15) and (4.17), where b(x) — x(l — x) and 

^(x) = exp{a(2/i — l)(x — x*) 2 }. (5.51) 

The retardation factor as measured by mean fixation times corresponding 
to Robertson’s eigenvalue ratio is the ratio of the mean fixation time as 
just calculated to its neutral theory counterpart (5.19). We do not give here 
details of the numerical values found: For these see Ewens and Thomson 
(1970). The conclusions are in general agreement with those of Robertson, 
at least for p — x*. For general values of p the behavior can be quite 
complex, the retardation factor sometimes increasing, then decreasing, then 
increasing again as s increases. 

We turn now to mean absorption times conditional on eventual fixation 
(or loss) of a specified allele. The formulas appropriate to calculate this are 
(4.49) and (4.50) or (4.51) and (4.52). Perhaps the case of greatest interest 
is when 0 < h < 1 and the condition is made that the favored allele fixes. 
When h = equations (4.49) and (4.50) give 

t*(x;p) = 2e~~ ax {l - e~ a{1 - p} }{e ax - l} 2 

x [ax(l - x){l - e~ a }{e ap - 1}] _1 , 0 < x < p, (5.52) 

t*(x;p) = 2{e ax — l}{e a ( 1-x ) — l}[ax(l — x){e a — l} -1 , p < x < 1. (5.53) 
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Similarly, if the condition is made that eventually A\ is lost, 

t**(x;p) = 2{e ax - l}{e a(1_l) - l}[ax(l - x){e a - l}] -1 , 

0 < x < p, (5.54) 

t**{x;p) = 2e~ a{1 ~ x) {l - e- ap }{e a{1 ~ x) - l} 2 

x [ax{l — x){l — — l}] -1 , p < x < l.(5.55) 

There are several interesting points about these equations. First, the value 
of t*(x;p) for x > p is identical to that of t**(x;p) for x < p. We explain 
why this should be so when considering time-reversal properties in Section 
5.9. The second point concerns the nature of the formula for t*(x;p) for 
very small p, or correspondingly t**(x;p) for very large p, and is relevant 
when considering a selectively favored new mutant destined for fixation. 
We observe that t*(x;p) is symmetric about x = 0.5; the mean time spent 
in any interval (x,x + Sx) is the same as the mean time spent in (1 — 
x — 5x, 1 — x). Even more surprisingly, t*(x;p) remains unchanged if a is 
replaced by —a, so that a selectively disadvantageous mutant, if destined 
for fixation, spends as much time, on the average, in any frequency range 
as a corresponding selectively advantageous mutant destined for fixation. 
This remarkable fact, noted first in effect by Maruyama (1974), will again 
be reconsidered later in the light of time-reversal properties. It is indeed 
easy to see that the entire behavior of the conditional process is independent 
of the sign of s, since the diffusion coefficient 6*(x), calculated from (4.56) 
and (5.5), is independent of s while the drift coefficient a*(x), calculated 
from (4.55), (5.43), and (5.47), is 

a*(x) = \ax{l — x)/tanh(^ax). 

Clearly a*{x) is independent of the sign of a: This more detailed conclusion 
was first noted by Watterson (1977b). However, despite the symmetry of 
t*(x) around x = it is not true that a*(x) = a*(l — x). 

For arbitrary levels of dominance, (4.50) shows that with p = (27V) -1 , 

1 X i 

t*(x; (2N)~ 1 ) — 2(b(x)ip(x) J tp{y)dy)~ 1 J ip(y)dy J 4>(y)dy, (5.56) 

0 Ox 

where ^(y) is defined by (5.46). If this expression is written more fully as 
t*(x;a, /i, (27V) -1 ), it follows that 

t*( x; a, h, (2N)~ 1 ) = t*(l-x;a,l- h, (2N)~ 1 ). (5.57) 

This implies that conditional mean fixation time properties for a favored 
allele are the same as those for the corresponding disadvantageous allele, 
provided the dominance relation is reversed. This generalizes the conclusion 
just reached for the case of no dominance. 

Using the notation (5.1), the fitnesses will display overdominance if sh > 
s > 0 or underdominance if sh < s < 0. In either of these cases the 
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quasi-equilibrium point x*, defined above, may be written more fully as 
x * = x*(h) = h/(2h - 1). Then (5.57) may be written as 

t*(x,x*, (2N)- 1 ) = t*{ 1 - x, 1 - x*, (2 N)- 1 ), (5.58) 

an equation first noted in this form by Nei and Roychoudhury (1973). 

For general levels of dominance it is no longer true (as it was with no 
dominance) that t*(x, a, h, (2iV) -1 ) = £*(x, -a, 1 - h, (2N)~ l ), nor is it 
true that a*(x,a,/i) = a*(x,— a, 1 — h). There is, however, one relation, 
first noted by Maruyama and Kimura (1974), that does remain true. Keep 
the fitness scheme (5.1) fixed and consider two cases, one where the initial 
frequency of A\ is (2N)~ l and the condition is made that A\ eventually 
fixes, and the other where the initial frequency of Mi is 1 — [2N)~ l and 
the condition is made that A\ is eventually lost. By considering A 2 rather 
than A\ it is clear that the equation 

t*( 1 - x, -a, 1 - h, (2 N)- 1 ) = £**(x, a, /i, 1 — (2 N)~ l ) (5.59) 

must be true, and this may be used with (5.58) to show that 

£*(x, a, h, (2 TV)- 1 ) = £**(x, a, M - W 1 ). (5.60) 

(We noted above the special case of this equation when h — |.) Thus the 
mean time spent in any frequency range is the same for both processes. On 
the other hand, it is not true that a*(x) = a**(x) so that despite (5.60), the 
two processes have quite different properties. Again, these perhaps para- 
doxical conclusions will be reconsidered, and to a large extent resolved, 
when we consider time-reversal properties of diffusion processes. 



5.5 One-Way Mutation 

Until now in this chapter we have ignored the possibility of recurrent mu- 
tation from one allelic type to another when considering allele- frequency 
behavior. In some circumstances this might cause little inaccuracy but 
in general, especially from a macro- evolutionary rather than a micro- 
evolutionary point of view, it is essential that mutation be taken into 
account. In this section we make a start on this by supposing a model such 
as (3.16) where A\ mutates to A 2 (at rate u), with no reverse mutation. 
The drift and diffusion coefficients for the diffusion process approximating 
this Markov chain are, when there is no selection, 

a(x) = —\dx, b(x) = x( 1 — x), (5.61) 

where 6 = ANu . Clearly Ai eventually becomes lost from the population 
and interest centers entirely on properties of the time until this loss occurs. 
These properties are defined in large measure by the function £(x;p), and 
insertion of the coefficients (5.61) into (4.38) and (4.39) gives this function 
immediately. The values so calculated are given in (3.19), where allowance 
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must be made for the fact that the time-scale assumed there assumes unit 
time for one generation. Perhaps the case of most interest is when p = 
(27V) -1 , so that to a close approximation, the mean time that A\ exists in 
the population is 



?{(2AT)- 1 }« J 2y-\l-y) 0 - 1 dy (5.62) 

(2 N )- 1 

generations. This is of order 2 log (27V) generations for moderate values of 
9: A new mutation A\ will not, on average, remain in the population for 
very long, or to attain a high frequency, if there is no recurrent mutation 
A 2 — y A \ . 

The process we are considering, since it admits the possibility of only 
two alleles, is perhaps of limited interest. However, several of its properties 
throw considerable light on important features of the infinitely many alleles 
model (3.72). Some of these were already given in (3.92) and (3.93). It is 
clear that in the infinitely many alleles model, we may normally expect 
several low-frequency alleles in the population. For example if 6 — 1, 27V = 
10 6 , there will typically be about fifteen alleles present in the population 
at any time, and of these, typically about ten will have a frequency less 
than 0.01. If 9 is small enough, the most likely situation is where there 
is one allele at high frequency together with several alleles at a very low 
frequency. This is confirmed by observing, from (3.92), that 

Prob (there exists an allele with frequency greater than 0.99) 

= mean number of alleles with frequency greater than 0.9 
1 

= 9 f x~ l {l — x) e ~ l dx (5.63) 

0.99 

1 

0 f (1 — x) e_1 dx 
0.99 

= (0.01)*. 

For 9 = 0.1 this probability is about 0.63. For larger values of 9 (9 > 1 
approximately) it becomes rather unlikely that such a high-frequency al- 
lele will exist, and the most likely configuration is one where a number of 
alleles exist at low but unequal frequencies. In all cases the least likely situ- 
ation is one where two, three or four alleles exist with approximately equal 
frequencies. These arguments suggest an approach to testing whether a neu- 
tral model such as (3.72) and not, for example, one involving selection is 
adequate to explain observed allelic frequencies. We take up this question 
at greater length in Chapter 9 when discussing the concept of polymor- 
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phism and in Chapter 11 when considering tests for neutrality using allele 
frequency data. 

There are two further points that are of interest in considering the model 
(3.72) and its evolutionary behavior. The first concerns the nature of the 
boundary x = 1 for the two allele model originally considered. Use of (5.61) 
in (4.65) shows that this boundary is entrance if 6 > 1. This implies that 
in this case, it is impossible to reach this boundary by diffusion from the 
interior of (0, 1). It is therefore impossible to consider behavior conditional 
on the requirement that this boundary is reached, and further it is unnec- 
essary to impose the condition that the boundary is not reached and then 
consider conditional behavior: This latter condition is already implicit and 
formulas such as (5.62) apply immediately. When 9 < 1 the boundary x — 1 
is regular and hence attainable and now new behavior arises under the con- 
dition that this boundary is not reached. Again assuming p = (2N) -1 we 
find that (5.62) must be replaced, conditional on x — 1 not being reached, 
by 



i{(2N)~ l }= J 2y-\l-yf- e dy (5.64) 

( 2 N )~ 1 

generations. The integrand in (5.64) has the usual interpretation that its 
integral over any frequency range provides the mean time that the allele 
frequency spends in this range before the allele is eventually lost. 

The second point concerns the frequency of the most frequent allele. The 
argument that led to (5.63) shows that for 0.5 < x < 1, the probability 
density function of the frequency of the most frequent allele in the infinitely 
many alleles model is, at equilibrium, 

f(x) = 6x~ l { 1 - x) e ~ l . (5.65) 

For values of x less than 0.5 a deeper argument is clearly required: Nev- 
ertheless, the probability density function of the most frequent allele can 
be found for these values also (Watterson and Guess (1977), Watterson 
(1976b)). Further details are given in Section 5.10. The use of (5.65) to ap- 
proximate the value of the monomorphism probability P mo no is of particular 
interest to us. This approximation gives 

Tnono = Prob (only one allele present in the population in any 

generation at equilibrium). (5.66) 

If we make the approximation 



i 

J f{x) dx, 



l-(2 TV)- 1 



(5.67) 
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then immediately from (5.65) 

Pmono « (2 N)~ e . (5.68) 

This approximation has been made on several occasions in the literature, 
and in Section 5.7 we examine its accuracy. 

In the Moran infinitely many alleles model an exact expression (see 
(3.99)) is available for P mono . In that expression the definition of 9 fol- 
lows the standard Moran model definition given in (3.98). It is interesting 
to use this expression, with however 9 now defined in the standard Wright- 
Fisher model form as 9 = 4 Nu, to obtain a heuristic Wright-Fisher model 
approximation 

P mono « (2 N - 1)!/{(1 + 9 )( 2 + 0) ... (27V - 1 + 9)}. (5.69) 

Although there is no justification for this heuristic approximation, we shall 
see later that, surprisingly, (5.69) gives a better approximation to P mon 0 
than does the more frequently used (5.68). 



5.6 Two-Way Mutation 



Suppose now in the model (3.16) that mutation both from A\ to A 2 (at 
rate it) and from A 2 to A\ (at rate v) occurs, with no selection. As we 
have already seen, there will now exist a stationary distribution for the 
frequency x of A\ for which we already have an exact expression (3.25) for 
the mean and an approximation expression (3.26) for the variance. 

Our aim now is to approximate the entire distribution by diffusion meth- 
ods. The drift and diffusion coefficients are found from (5.6) (putting a = 0) 
and (5.5) and then (4.45) leads to the stationary distribution 



f(x) = 



r{2/?i + 2 (5 2 ) 202 - 1 

r{2/? 1 }r{2/3 2 ) 



(l-x) 20 '- 1 



(5.70) 



The mean and variance of this distribution are fa / (Pi +P2 ) and P1P2H (Pi + 
p 2 ) 2 ( 2 P\ + 2/?2 + 1)} respectively, and these agree with the exact and ap- 
proximate values given in (3.25) and (3.26), once allowance is made for a 
change of scale. 

This stationary distribution allows a third derivation of (3.27) and (3.28). 
If u — v and 4ATx = 9 , then 2/?i = 2/?2 = 9 and thus 



fix) = 



T{26) 

mm 



xO-'il-x) 6 - 1 . 



From this, the probability that two genes drawn at random from the 
population are of the same allelic type is 



/' 



f(x){x 2 + ( l-x) 2 } = 



1+9 



1+29’ 



(5.71) 
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in agreement with (3.27) and (3.28). 

The general form of the stationary distribution (5.70) is clear. For small 
/?i and /? 2 , that is small mutation rates and/or population sizes, most of 
the probability mass is in the extremes of the distribution, so that the 
most likely situation is one where one or other allele is at a low frequency 
or is even temporarily absent from the population. When f3\ and /? 2 are 
large the variance becomes small and the behavior is “quasi-deterministic” : 
Only small deviations are likely from the deterministic theory equilibrium 
point. This can be illustrated by supposing u = v so that the mean of 
the stationary distribution is 0.5 irrespective of the population size. Thus 
supposing u = v = 2.5 x 10 -6 , the stationary probability that the frequency 
of Ai is between 0.4 and 0.6 rises from 0.2 for N = 10 5 to 0.8 for N = 10 6 
and is essentially unity for TV = 10 7 or more. 

The probability distribution (5.70) allows no atoms of probability at the 
boundaries x = 0, x = 1. This is in accordance with what is expected from 
the coefficients (5.5) and (5.6), which suggest instantaneous reflection from 
these boundaries. Nevertheless for the discrete process (3.16) there must 
exist nonzero stationary probabilities for the states {0} and { 2N }, and if u 
and v are small these probabilities will be quite large. It therefore becomes 
a matter of some interest to find how these boundary probabilities can be 
approximated from (5.70). This matter is taken up in the next section. 

When selection is also allowed, together with two-way mutation, there 
will still exist a stationary distribution, although its form is naturally more 
complicated than that in (5.70). Use of the complete expressions (5.5) and 
(5.6) gives, from (4.45), the formula 

f(x) = const x 2/?2-1 (1 - x) 2(3l ~ l exp{2 ahx — a(2h — l)x 2 } (5.72) 

for this distribution, where the constant is a function of /A, /3 2 , a, and h and 
may be found in principle by normalization. Since (3\ = 2 Nu, /3 2 = 2Nv, 
the expression in (5.72) is identical to that in (1.68). The form of this 
distribution is of most interest when overdominance occurs or when one 
allele is at a selective advantage to the other. The former case was discussed 
in some detail below (1.68), so we consider here only the latter. Assume for 
definiteness that s < sh < 0 so that A\ is at a selective disadvantage and 
consequently usually at a low frequency. Assuming then that x is small, we 
may ignore the term (1 — x) 2 ^ 1-1 as well as the term in x 2 in the exponent 
in (5.72) to get 



f(x) ~ const x 2 ^ 2 1 exp(2 ahx). 

Since x is a frequency, its value cannot exceed 1. However for sufficiently 
large values of \ah\ ( \ah\ > 3 should normally suffice), the function f(x) 
is negligibly small when x > 1, and the normalizing constant may be eval- 
uated to a sufficient approximation by supposing 0 < x < oo. This leads 
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to 



f(x)^\2ah\ 2/32 {T(2(3 2 )} 1 x 2 ^ 2 1 exp{— \2ah\x}. (5.73) 

From this the mean and variance of the stationary distribution of the 
frequency of A\ are found to be, approximately, 

(3 2 /\ah\, ±P 2 /(ah) 2 (5.74) 

respectively. Allowing for changes in notation, the mean value agrees with 
the deterministic equilibrium point (1.36), while the variance provides new 
information and gives some idea of the extent of stochastic variation that 
can be expected around the mean. Parallel values may be calculated when 
s > sh> 0. 



5.7 Diffusion Approximations and Boundary 
Conditions 

The aim of this section is to consider the extent to which various formulae 
derived from, or suggested by, diffusion methods provide accurate approx- 
imations to the true but unknown corresponding values for the Markov 
chain specified by (3.16) and (3.29). It must of course be kept in mind that 
this brings us no closer to an evaluation of how close our results are to 
“reality” : The diffusion process may well provide a better reflection of real 
population processes than does the Markov chain model. 

If one adopts the view that the primary process of interest is a Markov 
chain such as (3.16), approximating diffusion formulae can usually be ob- 
tained in two different ways. The first is to approximate the Markov chain 
process by a diffusion process by finding the appropriate drift and diffu- 
sion coefficients, using the theory of Section 5.1, to calculate the required 
quantity for this diffusion process, and then to use the value so found as 
an approximation for the Markov chain. The second way is more direct, 
and was used several times in Chapter 3: There is no concept of an ap- 
proximating diffusion process, and the quantity of interest is approximated 
by considering only the leading terms in a Taylor series expansion. The 
two approaches give the same formulas (compare, for example, (3.5) and 
(5.19)), and an extension of the second approach, using higher-order Taylor 
series approximations, leads to an assessment of the accuracy of diffusion 
formulas, using standard techniques. 

Consider for example the diffusion approximation (5.47) for the proba- 
bility of fixation of a favored allele A\ in the absence of dominance and 
mutation. This equation was also found, as (3.31), without using diffusion 
theory. Now (3.31) was reached by ignoring certain small-order terms in 
the deviation of (3.30), and a formula somewhat more accurate than (3.30) 
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is found from 



4 

0 = ^E(<fa)V^(a;)/i!, (5.75) 

%— l 

as this equation incorporates terms of order N~ 2 as well as terms of order 
iV _1 . An even better approximation would arise by replacing the derivatives 
in (5.75) by finite differences. If we now put 

7r(x) = {1 — exp(— ax)}{l — exp(-a)} -1 + N~ 1 g(x) (5.76) 

in (5.75), a second order differential equation for g(x) will be obtained. 
Since t/(0) = g( 1) = 0, this equation can be solved for g(x) and thus a 
correction term to tt(x) of order N~ x obtained. More details are given in 
Ewens (1964). 

Similar corrections may be made to the mean absorption times, although 
here difficulties arise for very large or very small values of p, since the 
higher derivatives of £(p), defined by (3.5) or (5.19), become increasingly 
large for these p values. Nevertheless, even for p = (2A r ) _1 the diffusion 
approximation (3.6) is remarkably accurate: A more precise value, found 
in Fisher (1958, p. 98) and confirmed by Watterson (1975), is 

t{(2N)~ 1 } = 1.355076 + 2 log 2iV (5.77) 

generations. An almost identical correction occurs when there is one-way 
mutation (cf. (5.62)). 

It is also possible to consider corrections to complete distributions arrived 
at by diffusion theory, in particular to the stationary distribution (4.45). 
Here it must first be decided in what way the diffusion formula is used as 
an approximation to a discrete distribution. If the stationary distribution 
{ai} defined below (3.24) is approximated by a continuous distribution 
/(x), both approximations 

(i+l/2)/27V 

oti ~ const J f(x)dx , ai ~ const f(i/2N) (5.78) 

(£— l/2)/2iV 

could be used. Although in general the two approximations will give similar 
values for moderate values of i, problems can arise with both definitions at 
i — 0 and i = 2N. Thus for the stationary distribution (5.72), the latter 
definition leads to a zero or infinite value for ay, both clearly unsatisfac- 
tory. However, the first approximation in (5.78) requires adjustments to 
the terminals of integration for i = 0 and i = 2N. Further, the integration 
involved often cannot be completed exactly and numerical methods, which 
reduce to evaluation of f(x) at discrete point values, are then required. 
Altogether the best way to view the diffusion approximation is probably to 
use the second approximation in (5.78) for all values of i other than 0 and 
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2N and to estimate ao and Oi 2 n through the stationary equations 

<X0 = ^2<XjPj,O, &2N — (5.79) 

The constant is now chosen so that ^ a* = 1. This approach normally leads 
to quite accurate diffusion approximations to the stationary distribution of 
Wright-Fisher models. 

In considering approximations to ao and ol 2 n , Wright (1931, p. 123), 
(1969, pp. 356-357) replaced (5.79) by the approximation 

2Nvao « 2Nua,2N ~ \ol 2 n-i- (5.80) 

These approximations were suggested by parallel approximations for the 
asymptotic conditional distribution lj (see (1.54)), considered in great 
detail by Wright and Fisher. In subsequent work Wright did not regard 
the approximations in (5.80) as necessarily being accurate, and checked 
carefully in each case whether they were reasonable. Unfortunately these 
approximations have often been used uncritically by other authors, and this 
has led to rather inaccurate expressions for ao and ol 2 N* Similar uncritical 
use of the approximation 

(27V) — 1 1 

«o « J f(x) dx, a 2 N ~ J f(x) dx (5.81) 

0 l-(2W)- 1 

has also led to estimates of large (relative) error. 

This latter point may be illustrated by discussing approximations to the 
quantity P mon o, defined in Section 5.5, for the infinitely many alleles model 
(3.72). The approximation (5.68) for this quantity can be reached by us- 
ing (5.81) as a starting point: This was essentially the approach of Kimura 
(1971), who computed the corresponding value in the iT-allele model (3.68) 
and then let K — > oc. This approach, however, uses diffusion approxima- 
tions for precisely those values when they are most suspect, and a more 
detailed computation of P m0 no is needed. This was provided by Watterson 
(1975), who arrived at the approximation 

Pmono ~ exp(-O.lOO30)r(l + 0)(2 N)~ 9 . (5.82) 

We have already noted the approximation (5.69) derived formally by 
putting i — 2N in (3.78). Table 5.2 displays exact values of P mo no for the 
case 2 N = 1000 found numerically, as well as the approximate values found 
from (5.82), (5.69), and (5.68). Clearly (5.82) gives an excellent approxima- 
tion for all values of 6 considered while (5.68) gives a good approximation 
only when 0 < 1. 

In considering diffusion approximations it has been assumed above that 
the population size, mutation rate, and selective differences are all such that 
the parameters a, /3i, and fa m (5.2) are 0(1). Not only is this condition 
rather imprecise: It may also not apply in several cases of interest. An 
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e 


0.1 


0.5 


1 


5 


10 


exact 


0.472 


0.267 x 10 ' 


0.902 x 10~ 3 


0.669 x 10- 13 


0.979 x 10 - ' 24 


approx. 

(5.82) 


0.472 


0.267 x 10- 1 


0.905 x 10 -3 


0.727 x 10- 13 


1.331 x 10 -24 


approx. 

(5.69) 


0.477 


0.280 x 10“ 1 


1.000 x 10~ 3 


1.188 x Hr 13 


3.470 x 10 -24 


approx. 

(5.68) 


0.501 


0.316 x 10 _1 


1.000 x 10 -3 


0.010 x 10“ 13 


1.000 x 10“ 30 



Table 5.2. Values of P mo no for 2 N — 1000 and various 6 

attempt to overcome this problem has been made by Ethier and Norman 
(1977), who provide bounds on diffusion approximations irrespective of the 
order of magnitude assumptions discussed above. More specifically, for the 
model defined by (3.16) and (3.24), Ethier and Norman provide an explicit 
upper bound for the difference between the expectation of any infinitely 
differentiable function of the gene frequency x and the value as calculated 
from the approximating diffusion process, for any values of JV, u and v. 
This bound is uniform over time and thus applies also to the stationary 
distribution. For further details see Ethier and Norman (1977), in particular 
their equation (7). 

An interesting case arises for the heterozygosity measure 2x(l — x). The 
stationary expectation of this quantity for the diffusion process may be 
found immediately from (5.70). If we assume for convenience that 2f3\ = 
2(3 2 = 9 , the value found for this expectation is 9/(29 + 1). However, the 
stationary expectation for the Markov chain defined by (3.16) and (3.24) 
can be found exactly and is, explicitly, 

0(1 — U ){1 — (2Af)~ 1 } 

26 - 2 u6 + (1 - 2u) 2 ' 

The difference between this and the diffusion approximation is 

u{2 — 2 u T 9 } 

(29 + 1 ){ 2 9 - 2 u9 + (1 - 2 u) 2 } ' 

For the Ethier and Norman theory the upper bound provided for the error 
in the diffusion approximation is found by applying their equation (7) for 
the function 2x(l — x). The bound is 

max(u, v) + (4N)~ X + 27 ma x(u 2 ,v 2 ) + (7/16)N~ 2 . 

In our case this may be written 

u\\ -\~ 9 T 27u T 7 u9 

and it is not hard to verify that this function does indeed bound the exact 
error in the diffusion approximation given above. 
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A further remark should be made about order of magnitude assumptions. 
The diffusions we have considered assume that E(6x) for the Markov chain 
(3.16) is of the same order of magnitude as var(fe), namely N~ 1 . This 
assumption is not always justified, for example if in the model defined by 
(3.16) and (3.24) N and u are jointly large enough so that Nu is not small 
and cannot be taken as of order unity, one case of special interest is that 
when E(fe) is of order e and Ne is large. Processes for which this is the case 
have been discussed by Karlin and McGregor (1974) and Norman (1974, 
1975a). For these processes the gene frequency clusters around its deter- 
ministic value given by the infinite population theory outlined in Chapter 1. 
Deviations of gene frequency from this value are, asymptotically, normally 
distributed with standard deviation of order (Ne) -1 / 2 . For certain param- 
eter values this diffusion approximation and the one we have discussed 
earlier overlap. This situation is analogous to the overlapping domains of 
applicability of the Poisson and normal approximations to the binomial 
distribution in statistical theory, and in such cases the two approximations 
approximate not only the discrete process but also each other. 

We observe in conclusion that diffusion theory can give a quite false im- 
pression not only quantitatively but also qualitatively about the boundary 
behavior in some Markov chains, and illustrate this by considering the sta- 
tionary distributions (5.72). The criteria in (4.65) shows that the boundary 
x = 0 is unattainable Ufa > that is if the mutation rate A 2 -» A\ is 
sufficiently large. Further, this conclusion remains unchanged whatever the 
selective parameters. Suppose now a « a/i « 0. The discussion centered 
around (5.73) shows that the frequency in A\ will usually be very small, 
and in a Markov chain model such as that defined by (3.16) and (3.29) 
there will be a substantial probability that at any time A\ is absent from 
the population, in contrast to the diffusion theory prediction. Thus sup- 
pose 2 N — 10 6 , v = \ x 10 -6 , s — —0.2, sh = —0.1 so that fa — 0-5, 
a = -2x 10 5 . Equation (5.74) shows that the mean and standard devia- 
tion of the number of A\ genes in the population at any time are 5 and 5, 
respectively, and this certainly suggests a non-negligible probability that 
there are no A\ genes present. The distribution (5.73) implies that for small 
positive integers i, the stationary probability of i A\ genes is approximately 
0.2exp(— 0.2z), and the approximation (5.79) suggests that the stationary 
probability of the value 0 is about 0.13. This is not negligibly small, and we 
conclude that the diffusion theory boundary behavior gives a rather mis- 
leading picture of the behavior of the Markov chain for very small numbers, 
including zero, of A\ genes. 




5.8. Random Environments 181 



5.8 Random Environments 

So far in this book it has been supposed that stochastic changes in gene 
frequencies have been brought about solely by random sampling effects in 
finite populations. There are however further sources of stochastic variation, 
and perhaps the most important of these is that brought about by random 
temporal changes in the selection parameters, due perhaps to fluctuations 
in the environment. Models for this form of randomness were introduced 
by Kimura (1954), whose analyzed (somewhat incorrectly) a model when 
the selection parameters W{j in the (infinite) population model (1.24) are 
of the form 



wi i = 1, W 12 = 1-5, w 2 2 = (1 - 5) 2 . (5.83) 

Here 5 is a random variable with mean variance a 2 5, and with higher 
moments 0(5 2 ) or less. It is assumed that 5 is a small parameter, so that 
small-order terms in 5 are ignored. 

Let st be the value assumed by the random variable 5 in generation t. 
Then with a slight change of notation, (1.24) can be written, for this model, 
as 



Xt+ 1 = X t /{1 - s t ( 1 - Xt)}. (5.84) 

If y t = x t /(l - x t ), this becomes 

Vt+i = 3/t(l - s t ) _1 , (5.85) 

and putting z t = lo gy t , (5.85) leads to 

t - i 

z t = z 0 - X] log d - s d- (5-86) 

i— 0 

Suppose now that the Si are independently and identically distributed 
random variables. Then apart from the constant zo , z t is the sum of inde- 
pendently and identically distributed random variables and thus the central 
limit theorem may be applied. Since 

E{ln(l — 5 )} « E{ — (5 + ^s 2 )} — 
var{ln(l — 5 )} « var(— 5 ) = 5a 2 , 

z t will have an approximate normal distribution with mean 

zq + td^fi -j- ^cr 2 ) — f^z (5.87) 



and variance 



t5a 2 = a 2 . 



(5.88) 
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A standard statistical transformation now gives the corresponding density 
function for x t as 






1 

—j= exp 

\J2itg z x(1 — x) 




(5.89) 



This conclusion is reached by more or less exact methods, the only approx- 
imation involved being the normal distribution assumption is justified by 
the central limit theorem. We now show how it can be obtained by diffusion 
methods. From (5.84), 

E(Jx) = E{sx(l — x) + s 2 x( 1 — x) 2 + 0(s 3 )} 

= Sx( 1 — x){(i + cr 2 (l — x)} + 0(5 2 ), (5.90) 

E(fe) 2 = 5a 2 x 2 (l — x) 2 + 0(5 2 ), (5.91) 

with higher moments 0(5 2 ) or less. Equations (5.90) and (5.91) provide 
the drift and diffusion coefficients for an approximating diffusion process. 
Several authors have incorrectly omitted the term in a 2 in E (5x) and thus 
have obtained incorrect solutions to the diffusion equation. The forward 
Kolmogorov equation for this process then becomes 

o r\ 

— /(x;f) = - — {&c(l -x){fi + a 2 (l -x)}f(x;t)} 

+ \^{$o 2 x 2 (l- x ) 2 f{ x \t)}. ( 5 . 92 ) 

It may be checked by substitution that the solution of (5.92) is (5.89), and 
thus the diffusion approximation leads to exactly the same solution as the 
central limit theorem approximation. 

We make several remarks concerning the solution (5.89). First, if fi + 
\u 2 > 0, the density function (5.89) increasingly concentrates near x — 1 as 
t — > oo, while for (i + \a 2 < 0 it concentrates increasingly near x = 0. This 
behavior was termed “quasi-fixation” by Kimura (1954), and was defined 
more rigorously by Karlin and Liberman (1974) through the concept of 
“stochastic local stability”. An equilibrium x* is said to be stochastically 
locally stable (Karlin and Liberman (1974)) if for any 7 > 0 there exists a 
neighborhood x* — £, x* + £ such that for any initial frequency x\ in this 
neighborhood, 

Prob ( lim x n = x*) > 1 — 7. (5.93) 

In— >00 J 



Thus in the model (5.83) the boundary x = 1 is stochastically locally stable 
if ji + \o 2 > 0 and the boundary x = 0 is stochastically locally stable if 
/ i + \o 2 < 0 . 

This shows that even when ji = 0, so that “on the average” A\ and 
A 2 have equal fitnesses, the density function of the frequency x of A\ still 
concentrates increasingly near x = 1. This reveals an important new obser- 
vation: The variance in fitness of any genotype over time is just as important 
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as the (arithmetic) mean fitness in determining evolutionary behavior. Fur- 
ther, if two alleles have equal arithmetic mean fitnesses, the allele with the 
smaller variance in fitness is in effect selectively favored. The true fitness is 
in fact measured best by the geometric mean fitness. In the above case, A\ 
and A 2 are selectively equivalent only if /x + \a 2 =0, and this corresponds 
to the fact that when terms of order 5 2 are ignored, — (fi 4- \a 2 ) is the 
geometric mean selective disadvantage of A 2 . 

Finally, the solution (5.89), in contrast to other solutions such as (5.11) of 
diffusion equations, does not have the form of an eigenfunction expansion. 
This is confirmed by the theory of Section 4.7. In the above process, appli- 
cation of (4.65) shows that the boundaries x = 0, x = 1 are natural, and 
for such boundaries it is not necessary that the solution be in eigenfunction 
form. 

A second selection scheme perhaps reveals more of the flavor of random 
environment models. Suppose the fitnesses in each generation are of the 
form 



wn = 1 + 5, w 12 = 1, w 2 2 = 1 + as (5.94) 

where 0 < a < 1. This model has been studied in detail by Karlin and 
Liberman (1974) in discrete time and Levikson and Karlin (1975) in the 
diffusion case. The conclusions are analogous in the two situations, and we 
present only several discrete-time results. We again assume s is a random 
variable having mean <5/x, variance 5a 2 , and higher moments 0(5 2 ) or less. 

Suppose first that a = 1. In this case the geometric mean fitness of 
the heterozygote is 1, whereas that of each homozygote is, to the order of 
approximation used, 1 + 5fi — \5a 2 . If /x = 0 the homozygotes have the 
same arithmetic mean fitness as the heterozygote but a lower geometric 
mean fitness and thus, from the previous discussion, can be regarded as 
being at a selective disadvantage to the heterozygote. This is confirmed 
by Karlin and Liberman (1974), who show that with probability one, each 
trajectory of gene frequency converges to 0.5. This behavior occurs even 
for positive \i so long as ji — \a 2 < 0. Thus 0.5 is stochastically locally 
stable for this case. 

When /x — \a 2 < 0 < (i — \a 2 ^ it is possible for the frequency of A\ to 
converge to 0, to 0.5, or to 1. Which type of convergence occurs depends on 
the initial frequency of A\. If /x — | a 2 < 0 the only two limiting possibilities 
are convergence of the frequency of Ai to 0 or 1. In all cases the actual 
convergence behavior is not deterministic in the sense that it will depend 
on the values taken by s in the early generations. 

When a < 1 the picture is far more complex, and it is then possible in 
suitable circumstances that a stationary distribution for the frequency of 
A\ can arise. The condition for this is that 



3 a / (1 + a) < 2/x/ a 2 < 1 



(5.95) 
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and that the initial frequency xo of A\ be between a/(l + a) and 1. (The 
condition (5.95) requires a < b.) The condition (5.95) follows from equation 
(4.8) in Karlin and Liberman (1974) but is rather less general than their 
condition, applying only when the moments of s have the properties we 
have assumed. The drift and diffusion coefficients for the model (5.94) are 

a(x) = x(l — x){(l + a)x — a}{fi — o 2 {x 2 + a(l — x) 2 )}, 

b(x) = a 2 x 2 ( 1 — x) 2 {(l -f a)x — a} 2 . (5.96) 



If these values are formally inserted into the stationary distribution formula 
(4.45), the result is 

f(x) = const x~ 2lt/aa \l - x)- 2ti /°\x - a/{l + a )} 2 (i+«W «^-4 ( 5 .97) 

which, for the parameters specified by (5.95), is integrable over (a/(l + 
a),l). Is it however the required stationary distribution? Tanaka (1957) 
has shown that for diffusion with inaccessible boundaries formal calculation 
of /(x) in this way does indeed provide the correct stationary distribution, 
provided that the resulting function is integrable and that ^(x), defined by 
(4.16), is non-integrable at both boundaries. We have already checked the 
former condition and since in this case 

V»(s) = const X ~ 2+2 - x)~ 2+2 ^{x - a/{ 1 + a )}2-2(l+aW^ 2 



we see that the second requirement is also satisfied. Further, (4.65) shows 
that both x = 1 and x = a/(l 4- a) are inaccessible and thus (5.97) does 
provide the required stationary distribution. The distribution (5.97) also 
applies for the diffusion case (Levikson and Karlin (1975)). 

The choice a = — 1 implies that fitnesses in (5.94) are additive. For 
fixed additive fitnesses the situation is straightforward: A\ will become 
fixed when s > 0 and A 2 will become fixed when s < 0. When s is a 
random variable, however, neither the boundary x = 0 nor the boundary 
x = 1 is necessarily stochastically locally stable in this additive case. The 
respective conditions that the boundary x = 0 and the boundary x = 1 be 
stochastically locally stable are, for small s, 

— M i a 2 > 0, (5.98) 

respectively. These conditions show that if the variance of s is sufficiently 
large compared to the mean of s, neither the boundary x = 0 nor the 
boundary x = 1 is stochastically locally stable. 

A model more general than (5.83) and (5.94) allows the fitness of the 
various genotypes to be 



A\A\ A x A 2 A 2 A 2 

1 4 “ J]n 1 1 4 “ ^ n 



(5.99) 



Since only ratios of fitnesses are relevant we do not lose generality by fixing 
the fitness of A\ A 2 at unity, and this is done in (5.99). The fitnesses rj n and 
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e n are the realized values, in generation n, of the random variables 77 and 
e. These variables are possibly correlated for any given n (for example, the 
model (5.94) is a special case of (5.99), and for the case a — 1 in that model, 
rj n = e n ). On the other hand it is assumed that ( 77 ™, e n ) are independent of 
(Vm^m) when n^m. We assume throughout that 

E(r]) = fiiS, var ( 77 ) = <t 2 <5, E(e) = /x 2 <5, var(e) = a (5.100) 

for some small parameter 5, while higher moments of 77 and e are 0(5 2 ) or 
less. 

Let x n be the frequency of A\ in generation n. Our objective is to de- 
termine how properties of the random sequence xi, X 3 , . . . depend on 
the joint distribution of 77 and e. The starting point for doing this is the 
recurrence relationship 

Xn-\-l ^n(l T n^n ) / {1 T T) n%n T ^n(l *^n) }• (5.101) 

The concept of stochastic local stability was defined in connection with 
(5.93). For the model (5.99), the boundary x = 0 is stochastically locally 
stable if Elog(l + e) > 0 and the boundary x = 1 is stochastically locally 
stable if Elog(l + 77 ) > 0. In other words, since 

Elog(l + r?) = (mi - + 0(S 2 ), (5.102) 

the condition that x = 1 be stochastically locally stable is that fi\ — \o\ > 0 , 
showing the importance of the variance in determining stochastic local 
stability behavior at x = 1 . A parallel remark applies for x = 0. 

These conclusions confirm those reached above for the model (5.83). In 
this model the boundary x — 1 was claimed to be stochastically locally 
stable when Elog(l — s) < 0 and the boundary x = 0 was claimed to be 
stochastically locally stable if Elog(l — s) > 0. The selective parameters 
in the model (5.83) are equivalent to (1 — s) -1 , 1 and 1 — s, and these are 
particular cases of the parameters in (5.99). The stochastic local stability 
behavior of this model then agrees with that of the more general model 
(5.99). 

We now turn to finite populations. Here stochastic fluctuations in gene 
frequency occur for two reasons, random sampling effects and stochastic 
variation in fitness. We consider as an example the behavior of diploid 
models with fitness scheme (5.99). If E(? 7 ) = E(e), var( 7 ?) — var(e) = cr 2 , 
corr( 77 , e) = r and A\ mutates to A 2 at rate u with reverse mutation also at 
rate u, there will exist a symmetric stationary distribution about x = 0.5 
of the frequency of A\ for any finite population size N . This distribution 
can be found explicitly (Avery (1977)) and increasingly concentrates near 
x = | as cr 2 increases. In other words, increasing the variance of the ho- 
mozygote fitnesses increases the degree of heterozygosity in the population. 
The degree of heterozygosity also increases with r. These interesting and 
important conclusions differ qualitatively from those found by incorrect 
analyses of Kimura (1954) and Ohta (1972). 
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If there is no mutation, interest centers on fixation probabilities and mean 
fixation times. Avery (1977) found that the probability of fixation of a new 
mutant increases substantially as a 2 increases, whereas it is almost inde- 
pendent of the value of r. The conditional mean times to fixation and to loss 
also increase substantially with a 2 and with r: Again, the earlier conclusions 
of Kimura and Ohta are qualitatively incorrect. The general observation in 
all cases is that increasing the variance of homozygote fitness tends to in- 
crease genetic variation in populations, often by substantial amounts. This 
observation is clearly of some relevance in possibly explaining at least part 
of the large genetic variation observed in natural populations. 

We turn now to spatial variation. Although normally this is considered 
when temporal variation also occurs, we first consider a model, due to 
Levene (1953), for which fixed fitness regimes occur in each of M habitats, 
the fitnesses in habitat i being 1 + 1, 1 + e^. In this “fixed Levene 

model” the entire population mates at random in a common area and 
then disperses to the various habitats, a fraction c* entering habitat i. The 
recurrence relation for the frequency x of A\ is 

x n+l = Y^°i[{ x n + V {l) xl}/{1 + V (t) xl + e W (! - x n) 2 }]- (5.103) 

i 

The equilibrium solutions of this system are of particular interest. Karlin 
(1977b) proved that in general at most three internal equilibria of (5.103) 
can exist, and that in certain cases at most one can exist. In particular, 
when 

v (i) + e (i) + v (i) e (i) < 0 (5.104) 

for i = 1,2,3 , . . . , M, at most one internal equilibrium exists, and if such 
an equilibrium exits does exist it is globally stable. If no such equilibrium 
exists there is a unique globally stable fixation point. 

One case where (5.104) plainly applies is where 7 /^ < 0, < 0 for 

all i. Here there always exists a unique globally stable internal equilib- 
rium, analogous to (1.31). A second case where (5.104) holds is the linear 
scheme 77 W = Whether or not an internal equilibrium exists depends 

in a complicated way on the 77 W and C{ values. A parallel remark holds 
when fitnesses are of the multiplicative form for which 1 + e* = (1 + r}i)~ l . 
The location of the equilibrium point must be found numerically from the 
recurrence relation (5.103). 

As might be expected, if 1 + 77 ^ > 1 > 1 + for all i, A\ becomes fixed 
in the population, with a corresponding conclusion for A 2 when 1 + v a) < 
1 < 1 + e (i) . 

We consider finally models involving both spatial and temporal varia- 
tion. Gillespie ( 1974a, b, 1976a, b, 1977a, b, 1978) and Gillespie and Langley 
(1974, 1976) have analyzed an increasingly complex series of such models, 
culminating in a “stochastically additive scale, concave fitness function” 
(SAS-CFF) model, that lead to broad predictions concerning natural pop- 
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ulations that fit well with observed genetic patterns. While various other 
spatial-temporal models also have to be considered, we concentrate here on 
the Gillespie-Langley model and its practical implications. 

Consider first a single population with fitnesses of the additive form 



A\A\ A 1 A 2 A 2 A 2 

1 (5.105) 

1 + 5 1 + 2(5 + t) 1 + t 

Here the values of s and t vary from generation to generation according to 
a stochastic process for which 

E (s) = {i s , var (s) = a 2 , E (t) = p t , var(£) = cr 2 , corr(s,£) = p. (5.106) 

Values of s and t at different time points are assumed to be independent. By 
suitably amending (5.98) or by direct argument, Gillespie (1974a) proved 
that a polymorphic stationarity distribution for the frequency of A\ exists 
if |a| < 1 , where 

a = 4 (n s - nt + |(<r t 2 - a 2 ))/{ct 2 + of - 2 pa s a t }. (5.107) 

He further showed that when the stationary distribution of the frequency 
x of Ai exits, it is of the form 



f(x) = const £ a (l - x) a . (5.108) 

We next consider a random Levene model which is identical to the fixed 
Levene model considered above, except that the fitnesses in any subpopu- 
lation vary randomly through time. Suppose that at time n the fitnesses in 
the ith subpopulation are 1 + Sn\ 1 + |(s^ + £^), 1 + tn\ respectively. 
These fitnesses are assumed to be independent over time but not necessar- 
ily from one subpopulation to another at any given time. Then (Gillespie 
(1974a)) the appropriate generalization of the above result is that a sta- 
tionary polymorphic distribution for the frequency x of A 1 exists if and 
only if 

4|Ms — Mt + !(°f — °f )|/{(°f + a t ~ 2p<T s <Tt)(l + tt(1 — k))} < 1, (5.109) 
where 

^ ^ ^ CiCj , k (c^ss + Ctt 2<J sstt ) / (^ s + 2pcr s <7^) , 

i>j 

Cssi&tt) — covariance of the s(t) values in two sub-populations 
at the same time, 

&sstt = covariance of the s value from one subpopulation with 

the t value from another subpopulation at the same time. 

The stationary distribution is again of the form given in (5.108). Increased 
patchiness as measured by increased n increases the probability that poly- 
morphism will arise, as does a decrease in the spatial correlations. For 
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completely correlated environments k — 1, in which case the condition 
(5.109) reduces to \a\ < 1. The means and variances of s and t are 
both important determinants of whether or not a stationary polymorphic 
distribution exists. 

All the above assumes additive fitnesses. Gillespie (1976a, b, 1978) and 
Gillespie and Langley (1974) argue more generally that it more likely that 
the enzymatic activities of the various genotypes are additive and that the 
fitnesses of any genotype are concave functions of these activities. Under 
this argument the activities of the three genotypes are of the form zi, 
\{z\ + z 2 )> z 2 and the fitnesses are (f)(zi ), (j){\{z\ +£ 2 )), <t>{z 2 ), where (j)(z ) 
is a concave function satisfying 

<p'(z) > 0, < 0, lim 4>(z) = K < 00 . (5.110) 

z— *oc 

Perhaps the simplest function having these properties is 

(j){z) = (1 + c)z/(c + z). (5.111) 

It is clear that the probability of polymorphism in this SAS-CFF model is 
increased compared to that in the cases when the fitnesses themselves are 
additive, since <t>{\{z\ + £ 2 )) > 1 ) + ^(^ 2 )- If ^1 and Z 2 have means pi 

and p 2 , common variance cr 2 , correlation p the polymorphism requirement 
generalizing (5.109) is 

2|pi - H/[a 2 (l - p){*'(l)(l + tt(1 - k)) - 0"(1)/0'(1)}] < 1, (5.H2) 

where k and 7 r are as defined above, the covariances <r ss , a t t and cr sst t 
referring to the distribution of the z : s. The stationary distribution, when 
it exists, is a beta distribution generalizing (5.108). 

Gillespie (1977a, pp. 305-311) demonstrated that the predictions of this 
model, especially for cj>(z) of the form (5.111), fit in remarkably well with 
a large series of observations of natural polymorphisms. We do not pursue 
these comparisons here, nor the generalizations of the model to the mul- 
tilocus case where the expected nature of observed linkage disequilibrium 
are also discussed, since they go beyond the purely mathematical analyses 
that are our primary concern. 

We conclude this section by observing that several diffusion processes 
discussed in this section possess natural boundaries. Thus since regular, 
exit and entrance boundaries have already been encountered in previous 
sections, we have seen that even rather simple genetic models can lead to 
all four of the boundary classifications given in (4.65). 



5.9 Time-Reversal and Age Properties 

It has been remarked on several occasions earlier that information about the 
past behavior of diffusion processes allowing a stationary distribution can 
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be obtained by determining properties of the future behavior. We should 
therefore be able to use some of the conclusions reached above to discuss 
past behavior of various processes, and in particular to find properties of 
the “age” of an allele. 

The time-reversal property states that for any diffusion on [0, 1] admit- 
ting a stationary distribution, the probability of any sample path leading 
from x (at time 0) to y (at time t) is equal to that of the “mirror-image” 
path leading from y (at time —t) to x (at time 0). Unfortunately this 
observation is not immediately useful for several questions of interest in 
population genetics, since these questions refer to processes for which either 
the boundary {0}, or the boundary {1}, or both, are accessible absorbing 
states of the diffusion process, and thus for which no stationary distribution 
exists. This problem can be overcome in the following way. 

Suppose that {0} is an absorbing state but that {1} is not: This will 
occur in practice, for example, if there is mutation from A\ to A 2 but 
no reverse mutation. Now introduce mutation from A 2 to A\ at rate e: 
A stationary distribution now exists and reversibility arguments apply. In 
particular, given a current value x for the frequency of Ai, the distribution 
of the time (in the future) until {0} is next reached is identical to that of 
the time (in the past) that it was last left. Now let e — ^ 0: The distribution 
of the time (in the future) until the frequency reaches 0 converges to that 
applying when e = 0. The limiting distribution is then identical to the age 
distribution of an allele which arose as a unique new mutation and whose 
current frequency is x. This argument can be made more precise (Watterson 
(1977b), Levikson (1977a, b)) by introducing a “return” process whereby 
the frequency of A\ is returned from 0 to 6 (5 > 0) whenever 0 is reached: 
In practice we put 5 = (2 N) -1 to correspond to the frequency of a new 
mutant. We now give some examples of the conclusions reached by this 
argument. 

Consider first the case of no selection or mutation. Assume the allele 
A\ arose by a unique mutation in an otherwise pure A 2 A 2 population and 
is now observed with frequency x. The distribution of its age is thus the 
distribution of its time until loss, conditional on the event that eventual loss 
does occur. This distribution can be found by centering attention on A 2 
(with current frequency 1 — x) rather than Ai, and is then given by (5.39) 
with p = 1—x. The mean age can be found either through this distribution 
or alternatively by replacing p by 1 — x in (5.34). This leads to a neutral 
theory mean age of —4Nx(l — x) -1 logx generations. The variance of the 
age is found by putting p = 1 — x in (5.37). 

A parallel formula can be found when we assume fitness values 1 + s for 
A\A\, 1 + ^5 for A\A 2 and 1 for A 2 A 2 . Use of (5.54) (with p = x) shows 
that the mean age of A \ , given that it is currently observed with frequency 
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x, is 

X 

j 4N[a{e a - l}]" 1 ^ - l}{e a < 1 -»> - l}{y(l - y)}~ 1 dy 

0 

1 

+ J 4JV{1 - e -ax }[a{l - e _ “}{e a(1_x) - - l} 2 

X 

x {y( l ~ 2/)} _1 dy (5.113) 

generations. This converges to the neutral theory expression as a — > 0, as 
we expect, and the form of the integrand allows calculation of the mean 
time, in the past, that the frequency of A\ assumed a value in any arbitrary 

interval (y 1 , 2/2 )■ 

Suppose now A\ mutates to A 2 at rate u with no reverse mutation. If 
one initial A\ gene occurred by a unique mutation and the frequency of A\ 
is currently observed at x, the mean age of A\ is, from (3.19), 

X 

4N(1 - 0)- 1 j y-qp - y) 6 ~ l - 1} dy 

0 

1 

+ 4TV(1 - ^-yi - {(1 - x) 1 -*} /(l- y) e ~ l dy (5.114) 

X 

generations. A case of particular interest is that for which x = 1, cor- 
responding to temporary fixation of A\\ This evaluation is allowed only 
when 0 < 1. 

It is also possible to consider the mean age of A\ conditional on the 
requirement that the frequency of A\ was never unity in the past. This is 
identical to the mean time for loss of A\ given that its future frequency 
never achieves the value unity. This is given by (5.114) for 9 > 1, since then 
the condition that the frequency of A\ never reaches unity is automatically 
satisfied. For 9 < 1 the probability that the frequency of A\ never reaches 
unity given a current value x is found from (4.15) to be (1 — x) l ~ 6 . Use of 
(4.51) and (4.52) then shows that the conditional mean age of A\ is 

X 

J 4Ny-\l-e)- l {l-{\-yf- e }dy 
0 

1 

+ J 4Ny- 1 (l-d)- 1 (l-y) 1 - e {(l-x) e ~ 1 -l}dy. (5.115) 

X 

This reduces to the expression (5.64) for x = (2iV) -1 . Equation (5.115) was 
first given by Kimura and Ohta (1973), using an approach different from 
that just given. 
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The conclusions reached above can be reached in a different way (Sawyer 
(1977)). If the Markovian random variable is approximated by a diffusion 
variable, the appropriate drift and diffusion coefficients are 

a(x) = 1 — x - b(x) = x(l — x). (5.116) 

In Sawyer’s approach the random variable is moved at random times im- 
mediately to the boundary x = 0. These moves occur at times that are 
independent of the current value of the variable, and the times between 
consecutive moves are independent random variables with density function 

f(t) = |0exp(— 0 < t < oc. (5.117) 

This approach to finding age properties arises because, if a given gene in 
the ancestral line considered is a new mutant, the frequency of its allelic 
type in the population is (27V) -1 when it first occurs, irrespective of the 
frequency x of the allelic type of its parent gene in the previous generation. 
Here the boundary x = 1 is not absorbing while the boundary x = 0 is 
not accessible, that is, it cannot be reached by drift from the interior of 
(0, 1), although we have seen this boundary can be reached because of the 
discrete moves in the process. The frequency x of the allelic type in the 
ancestral line has a stationary distribution which is found by Sawyer using 
renewal theory arguments. Sawyer found that 

lim Prob{xi < p} — 1 - (1 — p) 0 / 2 , (5.118) 

t— >oo 

and the reversibility argument shows immediately that the distribution of 
frequencies seen in the past is identical to that to be seen in the. future, given 
a current frequency x. From this observation we can derive the expression 
(5.114) as well as find further age properties of the process. 

We conclude this section by discussing the various symmetry properties 
in Section 5.4 in the light of time-reversal arguments. We take as our start- 
ing point the observation that any two of the equations (5.57), (5.59) and 
(5.60) imply the remaining equation. Now (5.59) is true by direct argu- 
ment, so the observation of prime interest is that (5.57) implies (5.60) and 
vice versa. Our starting point in Section 5.4 was that (5.57) was true by 
computation of both sides in the formula, and this then implies the truth 
of (5.60). Our starting point here is that time-reversal arguments imply the 
truth of (5.60). This is true because the reflection of a path leading from 
e (e small) to 1 — e is a path leading from 1 — e to e. The identity of the 
probability of the two paths leads directly to (5.60), and we conclude that 
time- reversibility implies (5.60) and hence (5.57) and (5.58). The further 
facts that t*(x;p) defined in (5.53) is symmetric about 0.5 and indepen- 
dent of the sign of a do not appear to follow directly from time-reversal 
arguments, although use of (5.57), which we have seen can be arrived at 
through time-reversal arguments, shows that when h = \ either property 
implies the other. 
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5.10 Multi- Allele Diffusion Processes 



In this section we consider diffusion approximations to finite Markov chain 
If -allele models of the form (3.68). 

The simplest version of the model (3.68) arises when the function ^ takes 
the value Xi/2N . It is clear that in this Markov chain model the probability 
of fixation of any allele is initial frequency, and we also have the eigenvalue 
formula (3.69) concerning the rate of decrease of the probability that j 
or more alleles exist at time t . To obtain further results we turn to the 
diffusion approximation to (3.68). 

We write x\ — Xi/2N (i = 1 , 2, . . . , K — 1) and let 5xi be the change 
in X{ from one generation to the next. Then elementary theory shows that 
given xu • • -,x K -i, 

E(5xi) = 0, var(&Cj) = (2N)~ 1 Xi(l— a^), covar(<5x*, Sxj) = — (2N)~ 1 XiXj. 

These parameters, in conjunction with (4.73), lead to the following partial 
differential equation for the joint density function / — f(x i, . . . ,xx-i]t) 
of #i, , xk-i at time £, where unit time corresponds to 27V generations: 



d£ 

dt 



K-l 



5 X) 



i<j 



This is a generalization of (5.10) and admits an eigenfunction solution 
generalizing (5.11), which has been found by Littler and Fackerall (1975) 
and Griffiths (1979c). The corresponding backward equation is 



qi 

dt 



-Pi) 



d 2 f 

dpf 



YXp'Pj 

i<j 



d 2 f 

dpidpj ’ 



where pi is the initial value of X{. This equation may be used to find vari- 
ous fixation probabilities. The probability 7r (= 7r(puP2, • • • ,Pk- i)) of any 
fixation event satisfies 



1 

2 






YXpw 

i<j 



d 2/ K 

dpidpj 



= 0, 



(5.119) 



subject to the appropriate boundary conditions. For example, the proba- 
bility that Ai eventually fixes satisfies (5.119) together with the boundary 
conditions 



tt(pi, • • . ,Pk- i) = 1 if Pi = 1, 

7r(pi, . . . ,Pk-i) = 0 if Pj+Pm~\ \~Pu = 1 (j,m, ...,w^z). 

The solution of these equations is 7 r = _p«, which we know also to be exactly 
correct for the model (3.68) with ^ = W/27V. Suppose now that we wish 
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to find the probability 7 r that ultimately Ai and Aj are the last two alleles 
to exist. Here the boundary conditions are 

n(pi,---,PK-i) = 1 if Pi + Pj — 1) 

n(pi,...,PK-i) =0 if Pm + Ps H \~Pu = 1 

and the solution of (5.119) satisfying these conditions is 

7T = PiPj{(f “Pi) -1 + (f - Pj)" 1 }- 

In the case K — 3 this shows, for example, that the probability that A\ is 
the first allele to be lost is 

P2P3{(1 -P2)" 1 + (1 -P3)" 1 }- 

Similar probabilities may be found for other fixation events. 

We turn now to questions concerning the time until various fixation 
events occur. The development is easiest when K — 3, so we discuss the 
analysis in detail in this case only, and quote results for larger values of K. 
Complete details are available in Littler (1975). 

Define 7* as the time required until exactly i (i = 1,2) alleles exist in 
the population. We first find an expression for E(Ti). Conditional on the 
event that A\ is the last remaining allele, the mean of T\ is 

E(Ti | A x ) = 2p[ 1 {\ -pi)log(l -px), 

from (5.34). Since the probability is p\ that indeed A\ is the last remaining 
allele we have 

E(T\) = — 2[(1 — pi) log(l - pi) 4- (1 -P 2 ) log(l -p 2 ) + (1 -ps) log(l -p 3 )]. 

Clearly this value can be extended immediately to the case of K alleles to 
get 



E(Ti) = -2 53(1 -pi)log(l -Pi). (5.120) 



It is equally straightforward to use the analysis leading to (5.37) to find 
an expression for the variance of T \ . 

In the three- allele case we find E(T 2 ) as follows. The event T 2 <t implies 
that at time t, at least one Pi value is zero. Standard probabilistic formulas 
for unions of events give 



e(t 2 ) = -2 5>togp i +53( i - ft ) i Q g( i -p i ) . 



(5.121) 



In the particular case pi — 1/3 these formulas give 

E(Ti) ~ 3.2A^ generations, E(T 2 ) ~ 1.6 AT generations. 
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Littler (1975) found that in the iT-allele case, 



Em) = - 2 (£(- i y- s ( K .} s s ) qtu -ph Pi.) 

s= 1 ' ' 

x log(l-p il PiJ)) (5.122) 

where the inner sum is taken over all possible values 1 < i\ <22 < • • ■ < 
i s < K. This reduces to (5.120) when i — 1 and generalizes (5.121) to 
arbitrary K when i — 2. It is of some interest to note that if pi = K~ l , 

lim E(Tj) = 2/j, J = 1,2 , ... . (5.123) 



These conclusion may be compared to the “eigenvalue” expression (3.69), 
and this comparison shows that the eigenvalues give a poor indication of 
the way in which E (Tj) changes as a function of j. 

A further asymptotic result of interest is given by Littler (1975). If 
is the number of alleles present at time t in a K-alleie model with pi = K ~ l , 
then 

E(M (t) ) ~ 1 + 3e -t + be~ 3t + 7e -6t H K (2j + l)e“ 5 j(j+1)t -| (5.124) 

as K — > oo. 

Suppose now A\ mutates to Aj at positive rate Uij ( i 7 ^ j ). It follows 
that 

E((5Xj) Xi ^ ^ ZLjj "T ^ ^ X j Uj i 

j j 

= (2N)~ 1 m i {xi,...,x K - 1 ) 



say, where 

mi(xi, . . . ,xk-i) = -Xi'Y^Pij 



and pij = 2Nuij. If each > 0 , (i ^ j), there will exist a stationary 
distribution for the joint frequency of Ai , . . . , Ak- i- This distribution has 
not been found in general, although it must clearly satisfy the stationarity 
equation 




d 2 






■ ,XK-l)f{xl, . ■ ■ ,X K -l)} 
Xi)f{x l,...,%-l)} 



i<j 1 3 



(5.125) 



Fortunately, in one case of special interest, (5.125) can be solved explicitly. 
This is the equal- mutation model for which is independent of i and j. 




5.10. Multi-Allele Diffusion Processes 195 



Suppose 



Uij = u/(K - 1 ), 

so that u is the total mutation rate for any gene. Then the appropriate 
solution of (5.125) is 

f(x i,...,x K -i) = • • • xkY -1 , ( 5 . 126 ) 

where we write for convenience xk = 1 — x\ — X 2 — • • • — xk-i and 
e = 4Nu/(K — 1). This (Dirichlet) distribution arises in several areas of 
statistics, and its properties are well known. It may be used to re-derive 
various formulae already found by other methods and, as we soon note, to 
find new ones. Thus the probability that two genes drawn at random are 
of the same allelic type is 

J Xifixu-.^XK-^dXK-l, 

and this leads, after some calculation, to the expression in (3.70). 

Suppose now the gene frequencies are arranged in decreasing order 

X(l) > x (2) > Z( 3 ) > • • ‘ > X ( K ) > 0. (5.127) 

These frequencies are called the order statistics of xi, x<i, ♦ • . , x^.. Their 
joint distribution is, directly from (5.126), 

„ / . K\T{Ke} , , 

/(x(i), . . . , X(x— i)) = {x(i) x {2) * ' * X {K)} • (5.128) 

From this distribution the joint distribution of the first j order statistics 
x (i), x ( 2 ), • • • , x (j) may be found, although we do not give the formula here. 

The limiting case K — > oo is of special interest. While the joint distri- 
bution (5.126) has no nontrivial limit, Kingman (1975, 1977b) found that 
the distribution of the first j order statistics does converge, for any j, to a 
nontrivial limit, called by Kingman the Poisson-Dirichlet distribution. (A 
more appropriate name is the Kingman distribution.) 

The form of this distribution coincides with the joint distribution of the 
first j order statistics in the infinitely many alleles model. This remarkable 
result is most important as it allows us, so long as we concentrate on order 
statistics and functions derived from them, to move freely between the K- 
allele model in Section 3.5 and the infinitely many alleles model of Section 
3.6. This allows us to approach certain problems in the infinitely many 
alleles model either directly or through the K-allele model, whichever we 
prefer. To illustrate this, the probability that two genes taken at random 
in the infinitely many alleles model are of the same allelic type has been 
computed directly in (3.74), and also can be found by letting K — >> oo in 
(3.70). The reason why the two approaches yield the same value is that the 
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probability in question can be expressed in terms of the order statistics as 

E(-F) = E(a^ 1} + x\ 2) + x\ z) + ■ ■ • ), (5.129) 

where F is the (random) population homozygosity. The eigenvalue set 
(3.90) for the infinitely many alleles model can also be found through a 
limiting process from the K - allele case. 

The expression for the Poisson-Dirichlet distribution is rather complex. 
The distribution of at least over the range (0.5, 1), has already been 
noted in (5.65). For (2 TV) -1 < < 0.5, Watterson and Guess (1977) 

show that the density function of is of the form 

f(x{ i)) = r(0 4- l)e je x e ^ 2 g((l - x {l) )/x {1) ), (5.130) 

where 0 = ANu and g(-) is a complicated function which is best defined 
through the Laplace transform equation 

oo 1 

J e~ tz g{z)dz — exp(# J y~ 1 (e~ ty — 1) dy). (5.131) 

o o 

More generally the joint density function of £( 2 ), . . . ,X( r ) is 

/(x (1) , . . . , x (r) ) = 0 r T(6)e ie g{y){x {1) x {2) * • • (5.132) 

where y — (1 — X(i) — X( 2 ) — ■ • • — x^)/x^ r y These conclusions are noted 

by Watterson (1976b). The expression (5.132) simplifies, when x^ H b 

aj( r _i) + 2^( r ) > 1, to 

/(£(!), . . . ,x (r) ) = 0 r {x {l) X {2 ) • ' •Z ( r)}" 1 (l “^(1) ^(r)) 0_1 - (5.133) 

One interesting conclusion to be reached from (5.130) concerns the age of 
the most frequent allele currently observed in the population. In particular, 
the following question can be asked: What is the probability that the most 
frequent allele in the population is also the oldest? Time-reversal argu- 
ments show that this is identical to the probability that the most frequent 
allele will last longer into the future. Given the current allele frequency 
configuration this is, by symmetry, the current frequency x^ of the most 
frequent allele. The unconditional probability in question is then just the 
mean value of #(!), namely 

1 

J Hi)f(x{i))dx { !) (5.134) 

(27 V )- 1 

where f(x^) is given by (5.130). The simple form for this density in (0.5, 
1) shows that a lower bound to this probability is 

l 

6 J x^ix^il - x^f- 1 } dx {1) = (§)*. 

0.5 



( 5 . 135 ) 
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Watterson and Guess (1977) compute a more accurate value using (5.130) 
and (5.134). In a similar way the probability that the ith most frequent 
allele is the oldest can be computed from the expected value of the ith 
order statistic. 

A further use for (5.132) arises in deriving the distribution of the pop- 
ulation homozygosity F = x\ + x\ + • • • . As we have seen, F can equally 
well be expressed as the sum of squares of ordered frequencies, so its distri- 
bution for the if -allele model converges to that for the infinite allele model 
as if — > oc. The complete distribution of F in the if -allele model is not 
simple. Nevertheless it is comparatively easy to find the mean and variance 
of F from (5.125): We obtain 

E(F) - (1 + e)/(l + ife) (5.136) 

var(F) = 20(1 + e)/[(l + Kef{ 2 + Ke){ 3 + Ke)], (5.137) 

where e = 0/(K — 1). The value for the mean is in effect computed by 
(5.127) and is identical to that reached in (3.70) by different methods. By 
letting if — ^ oc and using the convergence theory, the infinitely many alleles 
model mean and variance of F are found as 

E(F) = (l + 0)-\ var(F) = 20/[(l + 0) 2 (2 + 0)(3 + 0)]. (5.138) 

The value for the mean is a standard result, cf. (3.74). 

Suppose now in the if -allele model that selective differential exist. Let 
the fitnesses of AiAj be 1 + and suppose sij is of order TV -1 . Writing 
aij = 2 Nsij and assuming mutation as above, the joint density function of 
xi,... i can be found by solving (5.125) where now rrii(x i, . . . ,xk- i) 
contains a further turn due to selective differences. The explicit solution 
(Wright, 1949, p. 383) is 



K K 

/(xi, . . . , xk-i) — const exp(y^ ^ XiXj(Xij){xi • • • x^} €_1 . (5.139) 

i= i j = i 

In particular, if all heterozygotes have equal fitness 1 + s (s > 0) and all 
homozygotes have fitness 1, and if a = 2 Ns, (5.139) becomes 

/(aq, . . . , xk- i) = const exp{— aF}{xi ■ • • xx} e_1 . (5.140) 

We have noted that the limiting (if — > oo) behavior can be analyzed only 
by transferring attention to the order statistic X(i), X( 2 ), . . . , x^y Unfortu- 
nately the joint distribution of the order statistics is here too complicated 
to yield useful information. Considerable progress can be made, however, 
when the are small, and thus for the rest of this chapter we assume this 
is the case. We thus put, in (5.139), 

exp {H H XiX i ai i } = 1 + J2 XiXjOtij + 
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The joint distribution of the order statistics may be found by summing in 
(5.139) over all possible permutations of 1, 2, . . . , K. This yields 

\ AT!T \Kc\ / 4/7-1 e + 1 \ ✓ 2 \\ 

/(X(i), X(2), . . • , %(K)) — j/c (l A(F — q + ^ ) + 0( a ij)) 

X {X(1)X(2) (5.141) 



where 



A = tf- 1 5>« - {K(K - l)}- 1 F = E4)- 

We are interested in two particular fitness schemes. The first, or “heterotic” , 
scheme has just been noted: All heterozygotes have fitness 1 + s and all 
homozygotes have fitness 1. Here an = 0, aij = a = 2 Ns so that 

/(*<■>.■ . . , x m ) = (1 - “ { F - jrn } + °<“«)) 

x {x (l yx (K) Y~ l . (5.142) 

In the second selective model (the “deleterious alleles” model) a fraction 
7 of the K alleles are deleterious: individuals carrying i deleterious genes 
(i = 0, 1, 2 ) have fitness 1 — is. Here A — 0 and the terms in of - in (5.141) 
must be computed. This leads to 

/(x (1 ), . . .,*(*-!)) = (l + 2a 2 7 (l - 7 )(F - fil) + 0(4)) 

x {x ( 1 ) ---x w } £ - 1 . (5.143) 

From (5.142) and (5.143) the joint distribution of X(p, . . . may, in 
principle be found. From Kingman’s theory this joint distribution will con- 
verge, as K -» oc, to a nontrivial limit, namely that of the first j order 
statistics in the infinite allele heterosis and deleterious alleles models. (In 
the latter model 7 is defined as the probability that any new mutant allele 
is deleterious.) We do not pursue these distributions here other than to 
note that since to the order of approximation considered (5.143) can be 
obtained from (5.142) by replacing a by — 2 a 2 7 (l - 7 ), the same must be 
true in the limiting order statistics distribution and any function derived 
from it. The information of most use to us may be found from one such 
function, namely the frequency spectrum of the two models considered. 
The frequency spectrum <f>(x) was introduced in Section 3.6: It has the in- 
terpretation that (j>(x)Sx is the mean number of alleles in the population 
with frequency in (x,x + Sx). From this frequency spectrum we can find 
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three quantities of some interest in theoretical population genetics: 

i 

J <f)(x) dx = mean number of alleles in the population, (5.144) 
o 

i 

J x 2 (p(x) dx = mean population homozygosity, (5.145) 

o 

i 

J {1 — (1 — x) n }(j){x) dx = mean number of alleles in a sample 
o 

of size n from the population. (5.146) 

The mean population homozygosity, E (F), is the probability that two genes 
taken at random from the population are of the same allelic type. For the 
neutral infinitely many alleles model, 

<; b(x)^2N9 , 0<x<(2A0 _1 , 

4>(x) « 9x~ 1 {l - x) e ~ l , (2 N)~ l < X < 1. (5.147) 

Use of (5.147) in (5.144), (5.145) and (5.146) re-derives the quantities 
(3.92), (3.74) and (3.85) found in Chapter 3 by other methods. Our aim is to 
consider the corresponding three values in the “heterotic” and “deleterious 
alleles” selection models. 

Watterson (1977a) demonstrated that in the heterotic model the 
frequency spectrum becomes, for (2N)~ 1 < x < 1, 

(j){x) = 9x~ 1 ( 1 - + ax{2 - (2 + 0)x}( 1 + 9)~ l + 0(a 2 )). (5.148) 

It follows immediately from (5.144)-(5.148) that the mean number of alleles 
in the population exceeds its neutral theory value by an amount 

Qi0( 1 + 9) + O(g^), (5.149) 

that 

E (F) = (1 + 9)~ l - 2a9{{l + 0) 2 ( 2 + 0)(3 + 9)}~\ (5.150) 

and that the mean number of alleles observed in a sample of n genes exceeds 
its neutral theory value by 

a9(l + 6)~ 2 - 26(0 + 2n){(l + 6)(n + 6)(n + 1 + 0)}" 1 . (5.151) 

Clearly, for large n, this is approximately equal to the expression in (5.149), 
as we expect. 

We have noted that for the deleterious alleles model the parallel quanti- 
ties may be found by replacing a by -2a 2 y(l -7). Thus for this model the 
mean number of alleles in the population falls short of the neutral value by 

20a 2 7(l — 7)(1 + 6)~ 2 , (5.152) 
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the mean homozygosity is given by 

E(F) = (1 + 6)- 1 + 4a 2 7 (l - 7)0{(1 + 0f(2 + 0)(3 + fl)}]” 1 (5.153) 

while the mean number of alleles in a sample of n falls short of its neutral 
theory value by 

2a 2 7(l — 7)0(1 + 0) -2 — 2a 2 'y(l — 7)0(0 + 2n) 

x {(l + flXn + flXn+l+fl)}- 1 . (5.154) 




6 

Two Loci 



6.1 Introduction 

Most of the theory of this book so far has assumed that the fitness of any 
individual depends on his genetic make-up at a single locus. Although for 
certain specific purposes this assumption may give reasonable approxima- 
tions, it is in general a gross simplification, in particular when epistatic, 
that is interactive, effects arise between loci. In this chapter we suppose 
that the fitness of any individual depends on his genetic constitution at 
two (or sometimes three) loci. Although this assumption is hardly less re- 
alistic than the previous one, it does allow substantial advance to be made, 
as has been noted in Section 2.10, on assessing the evolutionary effect of 
recombination between loci. It also allows us to assess the extent to which 
two- locus behavior is predictable from combining two single-locus analyses. 
We shall also see later in this chapter that it allows an investigation of the 
effects of modifier genes. 

The model we use is that described in Section 2.10. We assume viability 
selection only, with fitness scheme given by (2.90) or, equivalently, by (2.91), 
random mating, no fitness differentials between sexes and a discrete time 
parameter. Thus the recurrence relations (2.94) describe the evolution of 
the frequencies of the gametes A\B \ , A i# 2 , A<iB\ and A^Bi, that is of 
gametes of types 1, 2, 3, 4 respectively. 

We have already observed, in Section 2.10, two major consequences of 
these recurrence relations. The first is the the mean fitness increase theo- 
rem, claiming that under random mating mean fitness increases from one 
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generation to the next, or at worst remains stable, is no longer true as a 
mathematical theorem. The second is that the equilibrium points of the 
recurrence system can depend on the recombination fraction between the 
loci. We start by examining these conclusions in greater detail. 



6.2 Evolutionary Properties of Mean Fitness 

Our aim in this section is to discuss the implications of the recurrence 
relations (2.94) for the evolution of mean fitness, defined by (2.93). The 
first substantial analysis of multilocus mean fitness behavior was given 
by Kimura (1958), although here we adopt a rather different approach 
than his. A second important early discussion of the question, not perhaps 
sufficiently highly appreciated, is that of Kojima and Kelleher (1961); The 
fact that mean fitness can decrease in two-locus systems was first explicitly 
mentioned in their paper. 

If there is linkage disequilibrium at any stable equilibrium point, as is 
true, for example, at the equilibria (2.97) or (2.98), it is always possible to 
find a neighborhood of the equilibrium point such that, starting from any 
point in this neighborhood, the mean fitness decreases. Thus the MFIT 
cannot be true as a mathematical theorem, or in some cases be even ap- 
proximately correct. Karlin (1975) indeed asserts that it “usually fails” in 
the sense that for almost all sets of possible genotype fitness values, it is 
possible to find gamete frequencies for which the mean fitness decreases, at 
least for a few generations. 

These considerations, however, are probably of lesser importance from a 
practical standpoint. We are mainly interested in the behavior of mean fit- 
ness during those generations when substantial changes in gene frequency 
occur, and here it is possible to rescue in large part the spirit of the MFIT, 
at least in a wide variety of cases of practical interest. That such an at- 
tempt is not over-optimistic is supported by the observations of Karlin 
and Carmelli (1975), who observed that when the entries in the fitness 
matrix (2.90) are chosen randomly, the mean fitness increases for most 
generations for most fitness configurations. It is important, therefore, to 
emphasize that our aim is not to prove a mathematical theorem but to 
determine circumstances in which increases in mean fitness tend to occur. 

The first attempt at finding some principle along these lines was 
through the introduction of the principle of quasi-linkage equilibrium 
(QLE) (Kimura (1965)). The essence of this principle is as follows. If we 
define, in the notation of Section 2.10, a quantity Z by 



Z — C 1 C 4 /C 2 C 3 , 



( 6 . 1 ) 
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then to a first order of approximation the change in the value of logZ 
between consecutive generations is 

AlogZ « cj" 1 Aci 4 C4 x Ac 4 - c^ 1 Ac 2 - cj x Ac 3. (6.2) 

The values of Ac* can be found from (2.94) as 

A a = W~ l {Ci(Wi -w) + T]iRWi 4 (c 2 Cs - C1C4)}. (6.3) 

Substitution of these values into (6.2) eventually leads to the approximation 

w AlogZ « e - Rwi 4 (Z - l){c 2 4- c 3 4 Z _1 (ci 4 c 4 )}, (6.4) 

where 

e = Wi — W 2 — 4 w 4 . 

Suppose now that Z > 1. If e can be treated, approximately, as a constant, 
there will be a tendency for Z to decrease, at least for values of Z sufficiently 
large compared to e. Similarly when Z < 1 there will be a tendency for Z 
to increase, at least for very small Z. We may thus hope that Z approaches 
a constant value at which 

AlogZ = 0. (6.5) 

The change in mean fitness can be approximated, if small-order terms are 
ignored, by 

Aw k 2y^WiAa, 

and substitution from (6.3) then gives 

Aw « 2w~ 1 ~2 c i w i(wi -w) 4 Rw\ 4 (c 2 c 3 - cic 4 )e j . (6.6) 

If now (6.5) holds we must have, from (6.4), 

Rw 14 (c 2 c 3 - C\ c 4 ) = -e QTc- 1 ) , 
and substituting this into (6.6) we find 

Aw m w~ 1 (2y^ Cj(wj - w ) 2 - 2 e 2 fy^c" 1 ) ). (6.7) 

The two terms in the parentheses on the right-hand side have arisen in 
our previous discussion. The first is the total gametic variance defined in 
(2.106), and the second is the epistatic gametic variance defined in (2.107). 
The difference between the two must be the additive gametic variance, or 
equivalently, in view of the discussion in Section 2.10, the additive genetic 
variance. We then conclude that whenever (6.5) holds, 

Aw^w~ 1 a\. (6.8) 

Provided the above reasoning is satisfactory, we may therefore expect the 
system to evolve rapidly to a state where the approximation (6.8) is true, 
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and if this is so we have succeeded in our aim of rescuing the MFIT as a 
reasonable general principle. 

The above reasoning, however, requires much closer examination. Clearly 
any well-behaved function Y of gamete frequencies will converge to its 
equilibrium as the system itself approaches its equilibrium state, and at the 
moment we have no reason to prefer calculations using Z to those using 
the function Y. But when Y and Z are different functions, the value of Aw 
found by assuming AY = 0 will be different from that assuming A Z — 0. 
Indeed numerical calculations by Kimura (1965), and Ewens (1976) show 
that mean fitness usually changes at a much slower rate than does Z. It 
is therefore unreasonable to consider changes in mean fitness assuming 
A Z — 0. Instead it would be more reasonable to consider changes in Z 
assuming Aw -- 0 . 

Despite these comments, it is possible to arrive at the approximation 
( 6 . 8 ) by a deeper argument, at least in cases of biological interest. This 
was done by Nagylaki (1976): See also Hoppensteadt (1976) and Conley 
(1972). We now outline the main points of Nagylaki’s argument. 

Suppose that the fitness differences in the system (2.90) are small, so 
that Wij can be written as 1 + saij , where s is a small parameter and the 
dij are moderate or small. We consider the linkage disequilibrium measure 
D (= C 1 C 4 — C 2 C 3 ). From the recurrence relations (2.94), 

AD = -RD + sf(c u dij), (6.9) 

where the exact form of the function / is not important. This leads to 

t 

D(t) = (1 - RYD( 0) + s( 1 - Rf 52(1 - R)- u f( Ci (u - 1), ay), (6.10) 

U=1 

where D(t) is the value of D in generation t. Clearly, for R moderate, there 
exists a time t\ for which D(t\) is of order s so we can write, for t > ti, 
D(t) = sD*(t) and hence, from (6.9), 

AD*(t) - -RD'W + fiaifyoij). (6.11) 

We also know, from (2.94), that Ac* is of order s at most, and hence 

f(ci(t), a,ij) - f(ci(t - 1), dij) (6.12) 

is of order s at most. Hence, from (6.11) and (6.12), 

A D*(t) - RT'ifiait^dij) - f( Ci (t - l),«ij)> 

= -R{D*(t) - R-'ficiit^dij)} + O(s), 



or 

AG(t) = -RG(t)+0(s), (6.13) 



where 



G(t) = D*(t) - ir7(ci(i),a«). 



(6-14) 
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There will also exist a time £2 such that 

G(h){ 1 - R) t2 ~ tl < s, 

and from (6.14) and (6.11) it then follows that for £ > £ 2 ? 

AD = 0{s 2 ). (6.15) 

Since D = c^c^iZ — 1), it follows that 

A Z = 0(s 2 ) for £>£ 2 . (6.16) 

We turn now to changes in mean fitness. A more exact computation than 
that leading to (6.8) yields 

Aw = ( g a - 2 RDe + 2e 2 c~ l j ) + 0(s 3 ) (6.17) 

for £ > £ 1 . Further, a more exact computation than that leading to (6.4) 
gives 

Z~ 1 AZ = e-RDj2 c 7 1 +0(s 2 ). (6.18) 

If we use this equation, (6.17) becomes 

A w = (a 2 A + 2e(^2c~ 1 )~ l Z- 1 AZ^+0{s 3 ). (6.19) 

Now e is of order s, and then (6.16) shows that for £ > £ 2 , A Z is of order s 2 . 
Thus the second term in the parentheses on the right-hand side of (6.19) 
is 0(s 3 ) for £ > £ 2 . Since in general g a is 0(s 2 ), it follows that for £ > £ 2 , 

Aw « g 2 a ( 6 . 20 ) 

during these epochs of the process for which the various order of magnitude 
arguments hold. On the other hand, g 2 a is very close to zero when the 
system is near an equilibrium point, so that it is possible at that stage of 
the process that the second term in parentheses in (6.19) dominates the 
first term, leading to possible decreases in mean fitness. These decreases 
are probably small and of little evolutionary consequence. 

The correct statement of the QLE principle is thus the following. For 
small selective differences (of order s) and loose linkage, a state soon arises 
where the change in mean fitness is given by the right-hand side in (6.19), 
where A Z is of order s 2 and e is of order 5 . Since g a is usually of order s 2 
to a leading order of approximation, (6.20), embodying the main concepts 
of the MFIT, is usually true. It is not correct to say, as some versions of the 
QLE principle claim, that (6.20) is true because changes in Z are smaller 
than changes in w and can be ignored: It is possible (see Kimura (1965)) 
that A Z oc and yet (6.20) still holds. 

It is of some interest to illustrate this conclusion by some numerical 
examples. Table 6.1 shows the values of Aw, A Z, g a and e for the evolution 
of a system with fitness matrix (2.96), with R = 0.5 and with various 
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Generation 


A w (xlO 4 ) 


AZ(xl0 4 ) 


v'a( x1 ° 4 ) 


P(xl0 4 ) 


c(xl0 a ) 


Case 1: Initial gamete frequencies 0.11, 


0.16, 0.39, 0.34 




1 


-1.67 


1,750.0 


0.532 


-2.12 


0.793 


2 


-0.462 


1,020.0 


0.496 


-0.541 


0.728 


5 


0.376 


140.0 


0.450 


0.350 


0.676 


10 


0.386 


3.39 


0.402 


0.400 


0.690 


20 


0.310 


-0.526 


0.322 


0.322 


0.719 


30 


0.246 


-0.401 


0.254 


0.255 


0.744 


40 


0.193 


-0.301 


0.200 


0.201 


0.758 


50 


0.152 


-0.224 


0.157 


0.157 


0.775 


100 


0.045 


-0.046 


0.046 


0.046 


0.805 


200 


0.005 


0.002 


0.005 


0.005 


0.807 


Case 2: Initial gamete frequencies 0.42, 


0.09, 0.11, 0.38 




1 


-8.26 


128,000.0 


0.096 


-6.27 


0.073 


2 


1.23 


15,600.0 


0.054 


-0.334 


-0.384 


5 


0.693 


777.0 


0.034 


0.648 


-0.773 


10 


0.047 


19.9 


0.026 


0.047 


-0.821 


20 


0.019 


0.032 


0.020 


0.020 


-0.822 


30 


0.014 


0.043 


0.014 


0.015 


-0.818 


40 


0.010 


0.038 


0.011 


0.011 


-0.821 


50 


0.008 


0.034 


0.008 


0.008 


-0.818 


100 


0.002 


0.015 


0.002 


0.002 


-0.810 


Case 3: Initial gamete frequencies 0.00001, 0.48386, 


0.51612, 0.00001 


1 


-78.3 


1,180.0 


0.000 


1,130,000.0 


-2.288 


2 


-23.6 


2,560.0 


0.002 


-47.2 


-1.533 


5 


-1.36 


970.0 


0.004 


- 1.51 


-0.895 


10 


-0.030 


32.4 


0.004 


- 0.030 


-0.812 


20 


0.003 


0.042 


0.003 


0.003 


-0.808 


30 


0.002 


0.013 


0.002 


0.002 


-0.808 


40 


0.002 


0.010 


0.002 


0.002 


-0.807 


50 


0.001 


0.009 


0.002 


0.002 


-0.807 



Table 6.1. Parameters associated with the evolution of the two-locus system with 
fitness matrix (2.96); R = 0.5 



initial gamete frequencies. The table also gives the value of P = o\ + 
2e{J2 c 7 1 ) 1 Z _1 AZ, the right-hand side in (6.19), which from the above 
discussion we expect to provide a close approximation to A w. For these 
fitness values we may take s ~ 0.02. 

The values in the table illustrate the various points made above. First, 
the mean fitness may decrease in the early generations, but eventually it 
begins to increase. Second, the value A Z and o\ are, after the early gener- 
ations, both of order s 2 while e is of order s. Finally, and most important, 
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values of Aw are closely approximated by the quantity P throughout the 
process and, after the initial generations, by o\. The only exception to 
this rule arises in case 3 where the population starts in extreme linkage 
disequilibrium. Here the early generations of the process show large values 
for A Z and Aw and exceptionally large values of P. The additive genetic 
variance, however, is quite small. This is clearly an extreme case where the 
behavior of the system before time t\ cannot be predicted from the above 
analysis. 

In his original discussion of the QLE principle, Kimura (1965) provided 
an example in which the approximation (6.20) is quite accurate after a 
number of generations has passed, even though AZ — oc as £ -* oc. The 
state of QLE does not arise for some time, approximately 100 generations 
in this example, due to the very low value 0.0001 of R that is assumed. In 
this example the numerical values in the fitness matrix (2.91) are 



1.00 


1.00 


0.95 


1.00 


1.00 


0.95 


0.95 


0.95 


1.10 



and the initial gamete frequencies are all 0.25. In generation 100, AZ « 400 
and yet Aw = 11.59 x 10~ 5 , a\ — 11.01 x 10~ 5 . Clearly we cannot say 
Aw ~ o\ because AZ « 0. Despite the large value of AZ, the quantity 
P (= u\ + 2 eQ^c^" 1 ) 1 Z -1 AZ) takes the value 12.6 x 1CT 5 . This occurs 
because of the extremely small values of e and Z~ x . This shows that (6.19) 
holds and the QLE principle in the Nagylaki interpretation applies. 

There are several points one can make in conclusion. First, we have con- 
sidered only two alleles at the loci in question: The extension of the above 
arguments to many alleles has been made by Nagylaki (1977b). Second, 
it is of some interest to calculate the total change in mean fitness from 
the initial point to the equilibrium point. This may sometimes be negative 
because large decreases in mean fitness during the early generations out- 
weigh the consistent but small increases during later generations. Nagylaki 
(1977) examined this question and suggests that this happens compara- 
tively seldom: In most cases the total course of the evolutionary process is 
to increase the mean fitness. 

Finally, one may ask whether there are any special classes of fitness 
matrices for which the mean fitness always increases. Our class of fitness 
matrices where this is the case has been provided by Ewens (1969a, b). 
Suppose that the fitness matrix (2.91) is in the “additive” form 





B 1 B 1 


B1B2 


B2B2 


AiAi 


Oil -f 


OL\ + p 2 


T P3 


A1A2 


OL2 + Pi 


OL2 + P2 


Oi2 + P3 


A2A2 


<*3 + Pi 


®3 + P2 


&3 + P 3 



( 6 . 21 ) 
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The fitness of any individual is here the sum of two components, one char- 
acterizing the genotype at the A locus and the other characterizing the 
genotype at the B locus. For this fitness matrix the mean fitness w becomes 

w = ai(ci + c 2 ) 2 + 2 a 2 (ci + c 2 )(c 3 + c 4 ) + a 3 (c 3 + c 4 ) 2 

+ 0i{ci + C 3) 2 -f 2 /? 2 (ci -f- c 3 )(c 2 + c 4 ) + /? 3 (c 2 + C 4 ) 2 . (6.22) 

Suppose the gamete frequencies ci, c 2 , c 3 , c 4 take any arbitrary values in 
generator t. The mean fitness in generation t + 1 is found by replacing q by 
c' in (6.22). Now (6.22) depends on the gamete frequencies only through the 
gene frequencies c\ +c 2 , c 3 +c 4 , C\ + c 3 , and c 2 +c 4 . However, from the basic 
recurrence relation (2.94), the gene frequencies c[ + d 2 , C 3 + C 4 , c[ + C 3 , and 
C 2 +C 4 are independent of R once the c* are given and thus, in particular, are 
the same as for the special case R = 0. But when R = 0 the system (2.94) is 
identical to a four- allele single-locus system, and then Kingman’s theorem 
(see Section 2.4) shows that mean fitness is nondecreasing. It follows that 
in the two-locus system (6.21) the mean fitness is nondecreasing. While 
we have used single-locus theory to obtain this result, it is not true that 
the complete evolution of the system is identical to that of any four-allele 
single-locus system: We have used the parallel with the latter merely to 
assert that mean fitness is nondecreasing. 

This argument can be extended immediately to cover an arbitrary num- 
ber of alleles at each of the two loci and indeed an arbitrary number of 
loci with an arbitrary recombination pattern. Later we shall use this con- 
clusion to derive properties of additive fitness models. The result has been 
extended in a different way by Lyubich (1992), who found that the “in- 
crease in mean fitness” result also holds for a set of fitnesses generalizing 
the additive form given above (his fitnesses (9.5.11)). 

6.3 Equilibrium Points 

In the previous section we have been concerned with a specific property 
of the dynamics of the recurrence system (2.94). In the present (brief) 
section we turn to static properties and introduce the machinery whereby 
we examine the equilibrium properties of this system. 

If we write (2.94) in the form 

Ci 02 (^i 5 c 2 , C 3 , c 4 , .R) (6.23) 

the point c$ = c\ is an equilibrium point if the c* satisfy the equations 

c\ =0,(c*,c*,c*,c:,R), (i = 1, 2, 3, 4). (6.24) 

It is clear that the system (2.94) may possess several equilibrium points 
and, further, that these points may depend on the value of R. We discuss 
these observations in more detail later. When the fitness matrix (2.90) pos- 
sesses special properties the equilibrium equation (6.24) can often be solved 
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explicitly, but in general this cannot be done and numerical methods are 
required. In the next section we present examples of equilibria found by 
both methods, and we use both sets of results to discuss equilibrium prop- 
erties in some detail. We have already noted several connections between 
mean fitness and the equilibrium points. Thus any unique stable equilib- 
rium point when R = 0 corresponds to a maximum of mean fitness, while 
if the coefficient of linkage disequilibrium is nonzero at any stable equi- 
librium for R > 0, the equilibrium mean fitness must be less than the 
maximum possible fitness. Further, there are gamete frequency trajectories 
near such equilibria along which mean fitness is decreasing, at least over 
some generations. 

Suppose an equilibrium point of the system (2.94) has been found. It is 
then necessary to examine its stability behavior, since unstable equilibrium 
points are of little interest. The local linear stability of the system is tested 
by standard methods which we here outline. Suppose in any generation 
Ci = c* + Si, (with = 0), where the Si are small deviations from 
the equilibrium value. If the corresponding derivations in the following 
generation are then from (2.94), 



C* +s;= </>i(c* + *!,<$ + <*2, + *3, 4 + <S 4 ) 



= <?> i (clc* 2 ,c* 3 ,c* 4 ) + £6 j 



d(j>i 

dcj 



+ 0(5 2 ). 



(6.25) 



Here [/]* means the function / evaluated at the equilibrium point. If we 
ignore small-order terms, we get 



S' = AS, (6.26) 

where A is a 4 x 4 matrix whose (i,j) term is [dfa/dcj]* . Since 

$ (n) = M n <5 (0) , (6.27) 

the spectral expansion of the matrix A shows that the equilibrium point is 
locally linearly stable if and only if all eigenvalues of A are less than unity 
in absolute value. These eigenvalues can be evaluated numerically or, in 
special cases, found algebraically. The condition <5$ = 0 can be used to 
simplify the calculations under both approaches. 

Our definition of stability is a local one only. Questions concerning global 
stability and domains of attraction are far harder to answer and, especially 
concerning the latter, little is known about them. 



6.4 Special Models 

Historically the first analyses, both static and dynamic, of the recurrence 
system (2.94) assumed special forms for the fitness matrix (2.90). While 
this fact often allowed explicit expressions for the equilibria and explicit 
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criteria for their stability which suggested rather general conclusions for 
other fitness matrices, there was no certainty that the conclusions reached 
were not an artifact of the simple forms assumed for the fitness values, and 
thus no certainty of the generality of the conclusions reached. In this section 
we attempt to overcome these difficulties by presenting explicit conclusions 
for certain special matrices, as well as presenting numerical conclusions 
found from a number of more or less arbitrary fitness matrices. 

We have already noted the additive fitness model (6.21). For the 
multiplicative fitness model the matrix (2.91) appears in the form 



aipi 


& 1 P 2 


&103 






(*202 


Ot 2 P 3 


(6.28) 


asPi 


0^3 02 


®3p3 





while a third special class of fitness matrices is given by the “symmetric 
viability” model, for which the fitnesses are 

1-5 1-/3 1 -a 

1-7 1 1-7 (6.29) 

1 — a 1-/3 1-5 

In the analysis of this model it is usually assumed that a, /3, 7, 5 > 0, and 
we also make this assumption throughout. These models are not mutually 
exclusive: For example, if /3 + 7 = a = 5, the symmetric viability model is 
also an additive model. Several models which do not initially appear in one 
of these forms can in fact be so written by suitable re-parameterization. 
Thus the model 



1 + 5 


1 + t 


1 — 5 




1 


1 + t 


1 


(6.30) 


1 — 5 


1 + t 


1+5 





of Kimura (1956b) can be cast in the form (6.29) by putting a = (5 + 
t)/( 1 + £), p = 0, 7 = t/( 1 + £), 5 = (t - s)/(l + t). (Kimura’s analysis 
of the model (6.30) marked the beginning of the mathematics of two-locus 
models as discussed in this book.) However, a second model studied by 
Kimura (1956b), for which the fitness matrix is 



1 + 5 


1 + 5 + t 


1-5 




1+5 


1 + 5 + t 


1 — 5 


(6.31) 


1 — 5 


1 — 5 + t 


1+5 





cannot in general be cast in any of the three forms above. Similarly the 
model (1.90) does not fall into any of these forms, whereas the models 
(1.91) and (1.92) do. In examining both static and dynamic properties of 
the additive, multiplicative and symmetric viability models and of various 
numerical models we assume throughout that 0 < R < 0.5: the case R — 0 
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reduces to a one-locus four-allele model to which the theory of Section 2.4 
can be applied, while cases with R > 0.5 are not of biological interest. 

We first take the additive fitness scheme (6.21). Using the fact that 
mean fitness is nondecreasing in this model, Karlin and Feldman (1970b) 
demonstrated, in the case a 2 > an, a% and P2 > Pi i P3, that there exists a 
unique internal equilibrium point at 

ci = (a 2 - 0:3) (/?2 - P3)/ ((2«2 — oq — a 3 )( 2/? 2 — Pi — /%))> 

C2 = (a 2 - 0:3) (/? 2 — pi)/[((2a 2 — ol\ — 0:3) (2 P2 — Pi — P3)), (6.32) 
C3 — (a 2 — OLi){p 2 — Ps)/((2a 2 — an — as)(2p 2 — Pi — P3)), 

C4 = (a 2 — ai )(P 2 — Pi)/ ({ 2 a 2 — ai — as) (2 P 2 — Pi — P3))’ 

Further, this equilibrium is globally stable. The location of this equilibrium 
point is what would be expected by a composition of single-locus analyses, 
using (1.31). It is independent of R and the coefficient of linkage disequilib- 
rium is zero at it. Since also mean fitness is nondecreasing in the additive 
fitness model considered, one might be tempted to conclude that for this 
model the gene frequencies loci evolve in an independent way, and that for 
practical purposes the two loci can be studied separately. This conclusion, 
however, is not quite correct. If the coefficient of linkage disequilibrium is 
zero in any generation it does not necessarily remain zero in future gener- 
ations, and this can prevent gene frequencies converging monotonically to 
their equilibrium values given in (6.32). 

If ai > a 2 > a3 the frequency of Ai converges to unity. If at the same 
time P 2 > Pi , ps a polymorphism will be maintained at the B locus in ac- 
cordance with (1.31). This case and the corresponding cases where fixation 
occurs at the B locus or at both loci essentially reduce, so far as equi- 
librium properties are concerned, to single-locus systems, so we consider 
them no further here. The criteria that these fixation events occur depend 
only on the selective parameters {cq} and {Pi}, and are independent of the 
recombination fraction R. 

An important question concerning the concept of marginal fitnesses arises 
with respect to the equilibrium point (6.32). Suppose that we observe the 
A locus only and compute the marginal fitnesses at this locus as defined 
by (2.96). These marginal fitnesses reduce to oq + /?, a 2 + /?, 0:3 + p (for 
some P), as we might expect. A parallel observation applies for the B locus. 
The important observation for us is that, since in computing (6.32) it has 
been assumed that a 2 > aq,a: 2 > 0:3 and p 2 > Pi,P 2 > P3, the marginal 
fitnesses will also exhibit overdominance. The question of whether marginal 
overdominance always applies for stable internal two-locus equilibria is an 
interesting one, to which we shall return on several occasions. 

We turn now to the multiplicative system (6.28). The properties of this 
model are more complex than those for the additive system. We again 
suppose that a 2 > ai ,a 2 > 0:3 and P 2 > Pi , P 2 > P3 . Then for all values 
of R there exists an equilibrium point with gametic frequencies given in 
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(6.32). However, this equilibrium is stable only if R is large enough. Roux 
(1974), generalizing the analysis of Bodmer and Felsenstein (1967), found 
that stability applies if and only if 

7? > ( a 2 - <Xl)(02 - 0l){OL2 - ®3)(02 ~ ft) «o\ 

(2a 2 oti a s ){20 2 - 0 1 - 0 3 )a 2 0 2 * 1 j 



This is also a sufficient condition for local stability. Moran (1968) found 
that a sufficient condition for global stability of (6.32) is 



6(- — R) < min 



f (oi 2 ~ Q3)(<T2 ~ oq) {02 ~ 03) {02 Pi) \ 
1 a 2 ( 2a 2 - ol\ - a 3 ) ’ /? 2 (2/? 2 - 0\ — 0s) J 



(6.34) 



It is not known how close this is to being a necessary condition. He also 
provided two further sufficient conditions, namely 



- — R < -min 
2 2 




1 „ 1 

, - — R < -mm 

2 2 



0l_ 03 
02 02 



)• 



The requirement (6.33) has been generalized in an elegant way to an 
arbitrary number of alleles at the two loci by Roux (1974). Suppose we 
associate a multiplicative fitness component aij with the genotype AiAj at 
locus A and a multiplicative component bij to BiBj at locus B. Suppose 
further that when treated as single-locus systems, each of these has an 
internal equilibrium point where 



freq(Ai) = pi, freq (Bj) = qj. (6.35) 

Then the two-locus system will have an equilibrium point at which 
freq(AiBj) = piqj. We now define the A and B locus equilibrium mean 
fitnesses as 



put 



WA -'^'^PiPjOij, 


WB = 


(6.36) 


Cij = a^j -sj PiPj / rc a ? 


dij — y/qiqj / wb , 


(6.37) 



and let A, ^ be the largest nonunit eigenvalues respectively of the ma- 
trices {cij}, { dij }. Then Roux demonstrated that the condition for the 
equilibrium at which freq (AiBj) = PiQj to be stable is that 



R > A , 0/{(1 — A)(l — VO}- (6.38) 



In the case of two alleles at each locus, we may adopt our previous notation 
and put, for the A locus, 



an — aq, ai 2 — a 2i — a 2 , a 22 — a 3 . 

Then at an equilibrium point of this locus, 

Pi = (c*2 - a 3 )/{2o!2 - c*i - a 3 } = 1 - p 2 



(6.39) 
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and hence 



w A = (c *2 - aia 3 )/(2a 2 - «i - a 3 ). (6.40) 

It follows from (6.37)-(6.40) that 

ai(a 2 — as) _ a 2 {(a 2 — a 3 )(a 2 — ai)} 1 / 2 

0 11 o 5 C\2 ^21 9 5 

^2 - 0:10:3 — O1O3 

a 3 (a 2 - ai) 

c 22 ~ o * 

— 0103 

The eigenvalues of the matrix {c^} are easily found to be 1 and 
A = ~(a 2 - o 3 )(o 2 - oi)/(o 2 ~ 01 O 3 ). 

In a similar way we find for the B locus 

4> = -(02 - 03)(02 - 0l)/(02 - 0103)- 

Insertion of these values into (6.38) gives precisely the condition (6.33). 

Suppose now that the condition (6.33) is not met. In the symmetric 
case where oi = 03, j3\ = (3 3 there will now exist two stable equilibria, 
both possibly exhibiting large numerical values of the coefficient of linkage 
disequilibrium. Thus, for example, if a\ — a 3 = = (3$ = .99, a 2 = f3 2 = 

1, and R = 0.000009, the two stable equilibria are 



Cl 


C2 


C3 


c 4 




0.45 


0.05 


0.05 


0.45 


(6.41) 


0.05 


0.45 


0.45 


0.05 





Such equilibria arise in multiplicative models only for very tightly linked 
loci. However, we shall observe in the next chapter that for interactive 
systems involving many loci, rather less stringent conditions of the value of 
the recombination fraction still lead to equilibria of the form (6.41), that 
is with D 7 ^ 0. 

As with the additive case, whenever the conditions a 2 > ai,a 2 > a 3 
and (3 2 > (3 \ , (3 2 > (3 3 do not both hold, edge or corner equilibria may arise. 
These are of little interest to us and we consider them no further. 

We turn now to the mean fitness. First, since equilibria of the form 
(6.41) have nonzero linkage disequilibrium values, mean fitness cannot be 
maximized at them. The equilibrium (6.32) is perhaps of greatest interest. 
Surprisingly, the mean fitness is not maximized at this point: Rather, the 
fitness surface has a saddle point at (6.32). These conclusions imply that 
mean fitness can be decreasing in the neighborhood of equilibrium points. 

One special property of multiplicative fitness schemes concerns the coef- 
ficient of linkage disequilibrium D. If in any generation the equation D — 0 
holds, then the gametic frequency recurrence relations show that D re- 
mains at the value 0 for all future generations. In this case the mean fitness 
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must be nondecreasing, since the recurrence relations then reduce to single- 
locus equations. More generally, for R < 0.5, the sign of D will not change 
throughout the entire evolution of the recurrence process (Karlin, (1975)). 

We consider next the value of the equilibrium mean fitness as a function 
of R. When the inequality (6.33) holds, the equilibrium mean fitness is 
independent of i?, since the location of the equilibrium point is independent 
of R . When (6.33) does not hold the location of the equilibrium point 
does depend on R. Since the equilibrium mean fitness decreases with R for 
R & 0, we may suspect that it is nondecreasing with R for all i?, and this 
has been confirmed for this model by Karlin (1975). 

We observe finally that if a 2 > ol\,ol 2 > <^3 and 02 > /?i,/?2 > 03 
and an internal equilibrium point, either (6.32) or of the form typified by 
(6.41), exists, the marginal fitness values as computed from (2.97) exhibit 
overdominance. This is easily checked for the equilibrium point (6.32), and 
can also be verified for the linkage-disequilibrium equilibrium points. 

The last special case we consider in some detail is the symmetric viability 
model (6.29). This model possesses several unusual features not shared by 
additive or multiplicative schemes. Perhaps the most important of these 
is the unexpected existence of so-called asymmetric equilibria. The fitness 
matrix (6.29) possesses certain symmetry properties, and because of this 
one might expect that at any equilibrium point, the identities 

C* = c*, ci, — c* 3 (6.42) 

will hold. Strangely, while equilibria satisfying (6.42) usually do exist, there 
is a further class of equilibria for which (6.42) does not hold. This fact 
was discovered by Karlin and Feldman (1970a) and revealed previously 
unthought-of complexities in the model (6.29). Suppose for example that 
the fitness matrix (6.29) takes the form 



0.97 


0.96 


0.98 




0.96 


1.00 


0.96 


(6.43) 


0.98 


0.96 


0.97 





and that R = 0.04. Then the recurrence system (2.94) admits five equilib- 
rium points and at only one of these, the first noted below, do the equations 
in (6.42) hold. The five equilibria are 



Cl 


C2 


C3 


c 4 


0.238 


0.262 


0.262 


0.238 


0.154 


0.656 


0.036 


0.154 


0.154 


0.036 


0.656 


0.154 


0.889 


0.050 


0.050 


0.011 


0.011 


0.050 


0.050 


0.889 



For smaller values of R the number of symmetric equilibria for which the 
equations in (6.42) hold increases to three. Thus for R = 0.0004, apart 
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from four asymmetric equilibria, there are symmetric equilibria at 



ci c 2 c 3 

0.0034 0.4966 0.4966 

0.2734 0.2266 0.2266 

0.4950 0.0050 0.0050 

This does not automatically occur for every choice of a, /?, 7 , 5. Thus if 
the inequalities f3 + j> a, j3 + j>5 do not both hold, there will be only 
one equilibrium of the form (6.42), whatever the value of R. 

It is of particular importance for this model to ask two questions con- 
cerning stable equilibrium points. As noted above, up to seven internal 
equilibria can exist for certain parameter choices in the model (6.29). We 
therefore ask, first, how many of these can be stable, and second, will there 
always be at least one stable internal equilibrium, at least for certain val- 
ues of R? Karlin (1975) claimed that irrespective of R there can never be 
more than two stable internal equilibria for the system (6.29). Thus if seven 
such equilibria exist, at least five must be unstable. Can we be guaranteed 
that at least one internal stable equilibrium exists? Karlin and Feldman 
(1970a,b) found that this is so, at least provided R is sufficiently small 
(and, as we are assuming, that <a, /?, 7 , S > 0). For larger values of R quite 
complex behavior is possible, and this behavior can be described explicitly 
in the special case S = a. We restrict our attention to symmetric equilibria 
of the form (6.42), since Karlin and Feldman (1970a) demonstrated that 
the asymmetric equilibria are unstable in this case. Here the equilibrium 
point c* = 0.25 exists for all R but is stable only if 

R > \ (P + 7 — ct) (6.46) 

and 

a>\(3-a\. (6.47) 

If the right-hand side in (6.46) is negative and (6.47) holds, this of course 
implies stability for all R. When the inequality (6.46) does not hold there 
will exist two equilibria of the form 

C* = C* = 0.25 ± 0.25{1 - 4 R/(p + 7 - a)} 1/2 , 

c* 2 = c* 3 = 0.25 =f 0.25{1 - 4 R/(/3 + 7 - a)} 1/2 . (6.48) 

The condition on the recombination fraction R that these be stable is that 
4>(i?), defined by 

MR) = 4 R 2 (/3 + 7 - a) + 2R{2a 2 - (i 2 - 7 2 - a(f3 + 7)} 

+ a((3 + 7 — a) 2 , (6.49) 

be positive. This requirement is always satisfied for R sufficiently close to 
zero, so that the equilibria (6.48) are always stable if linkage is sufficiently 
tight. When R = (ft + 7 — a), the upper limit for R for which equilibria of 



c 4 

0.0034 

0.2734 

0.4950 



(6.45) 
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the form (6.48) exist, condition (6.49) reduces to (4.36). If we denote the 
solutions of the quadratic equation 

${R) = 0 (6.50) 

by R\ and R 2 , with (Ri < R 2 ), it follows that the equilibrium point (6.48) 
will be stable in the following cases: 

(1) if i?i > 0, \0 — 7I > a, then (6.48) is stable for 0 < R < R\, 

(2) if Ri > |(/? + 7 — a), then (6.48) is stable for 0 < R < |(/? + 7 — a); 

(3) if 0 < R\ < i?2 < \{0 + 7 — a), then (6.48) is stable for 0 < R < Ri 
and for R 2 < R < \{0 + 7 — a). 

In the latter case there exists a gap of instability (when Ri < R < R 2 ) for 
which there are no internal stable equilibria. Clearly the stability behavior, 
even of the symmetric equilibria, is not straightforward. 

Before leaving symmetric viability models we observe that the model 
(1.92) can be cast in the form (6.29) by suitable normalization and param- 
eterization. It is easy to see that if there exists an equilibrium of the form 
(6.42), then c\ = c satisfies the condition in (1.94). Further properties of 
this model follow from the general properties of symmetric viability models. 

Given a point of stable equilibrium, what can be said about the behavior 
of the equilibrium mean fitness as a function of RI Karlin (1975) asserted 
that this mean fitness is nonincreasing in i?, so that the behavior here 
agrees with that for additive and multiplicative models. Further, it can 
be shown (Karlin (1975)) that marginal overdominance holds. That is, if 
the marginal fitnesses are calculated as in (2.96) for any stable equilibrium 
point of a symmetric viability system, the marginal heterozygote fitness, 
at both loci, will exceed that of the corresponding homozygotes. 

It is convenient now to list the general conclusions and impressions drawn 
from our analyses of additive, multiplicative, and symmetric models. 

(i) Only in a restricted class of models (the additive class and more 
generally the class found by Lyubich (1992)) does the mean fitness 
increase theorem hold. In other models violations of this theorem are 
possible. 

(ii) When the double heterozygote is not the only viable genotype, but 
is at least is the most fit genotype, so that a 2 > aq, a 2 > as and 
02 > Pu 02 > 03 in the additive and multiplicative models, a, /?, 7, 
6 > 0 in the symmetric model, there always exists a stable internal 
equilibrium point for sufficiently small values of R. At least for the 
symmetric viability model, the existence of stable equilibria for larger 
values of R depends on quite complicated criteria involving R and the 
selective parameters. 
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(iii) Stable equilibria with small R often exhibit linkage disequilibrium 
whereas stable equilibria with large R often do not. 

(iv) For any given value of R there can be at most two stable polymorphic 
equilibria. 

(v) The value of the equilibrium mean fitness decreases (or at worst re- 
mains stable) as R increases, in accordance with the argument of 
Fisher outlined in Section 1.5. 

(vi) Whenever a stable equilibrium exists, the marginal fitnesses, com- 
puted from (2.96), exhibit induced over dominance. 

How general are these conclusions? It may be argued, since they have 
been derived from fitness matrices possessing special properties, that they 
are artifacts of the special features assumed and do not apply to a wider 
class of fitness matrices. To check this, we present conclusions derived al- 
gebraically from other fitness matrices and also numerically from arbitrary 
numerical fitness matrices. The latter have been considered in particular 
by Karlin (1975) and Karlin and Carmelli (1975), and we draw heavily on 
their conclusions in our analysis. 

We consider first the mean fitness increase theorem (MFIT). We have 
noted that if the coefficient of linkage disequilibrium D is nonzero at a 
stable equilibrium point, the MFIT cannot be true as a mathematical the- 
orem. Now (2.94) shows that the requirement D — 0 at equilibrium implies 
Wi = w (i = 1 , 2 , 3, 4) at that equilibrium. These equations imply certain 
constraints on the fitness parameters which will only hold in special cases, 
and we can conclude that in this sense, as a mathematical statement, the 
MFIT is false. However, it is perhaps more important to note, from our 
analysis of QLE, that the result implied by the theorem is “usually” true. 

Are there any general fitness schemes other than the additive scheme 
for which the MFIT is necessarily true? Karlin (1975) found that for the 
symmetric viability model with fitness matrix 

0 1-p 0 

1-7 1 1-7 (6.51) 

0 1 - (3 0 

with /? + 7 < 1, the theorem is necessarily true. This model, implying 
lethality of all double homozygotes, may be regarded as an unusual one, 
and since no other general class of models implying the mathematical truth 
of the MFIT has been found, we may conclude that in practice the additive 
model is the only important example where this theorem holds. 

We turn next to the question of the existence of stable equilibria when 
the double heterozygote is the most fit genotype. In all cases that we con- 
sidered above a stable equilibrium was guaranteed for small i?, and we 
now ask whether this is true for arbitrary fitness matrices. Karlin (1975) 
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demonstrated that this is not so: Even when the double heterozygote is the 
most fit genotype, it is possible, for some fitness schemes, that no internal 
stable equilibrium point exists for any positive recombination rate. 

One fitness scheme possessing this property is, in the notation of (2.91), 



0.98 


0.998 


0.98 




0.976 


1 


0.965 


(6.52) 


0.97 


0.995 


0.96 





Are there special conditions on the fitness values that ensure a stable inter- 
nal equilibrium, at least for small R1 When all four double homozygotes are 
more fit than any single heterozygote but less fit than the double heterozy- 
gote, there are two such equilibria for small R. Perhaps of greater interest 
is in the case where the double homozygotes are the least fit, with single 
heterozygotes of intermediate fitness and the double heterozygote most fit. 
In this case there exists at least one stable internal equilibrium for small 
values of R. All these results, together with details on the method of proof 
of these statements are given by Karlin (1975). 

We turn next to the conclusion that equilibria for small R usually ex- 
hibit linkage disequilibrium, whereas for large R the linkage disequilibrium 
is small or even zero. Of course this conclusion is not uniformly true: For 
additive models, for example, the (unique) internal equilibrium (6.32) ob- 
tains for all R and exhibits zero linkage disequilibrium. Nevertheless, as a 
general statement, this conclusion is broadly correct. Thus for R = 0.5 the 
fitness matrix (2.96) possesses a stable equilibrium at the point (2.98), and 
at this point the linkage disequilibrium is quite small (—0.000935). When 
R = 0.001 there exist two stable equilibria, at the points 

d = 0.447, c 2 = 0.030, c 3 = 0.022, c 4 - 0.500 



and 



ci = 0.015, c 2 = 0.503, c 3 = 0.469, c 4 = 0.013. 

At these two points the coefficient of linkage disequilibrium D is 0.223 and 
—0.236 respectively, and clearly these values of D are far larger than the 
value at the point (2.98). 

Despite these remarks, the relationship between R and the equilibrium 
value of D is not quite clear-cut. Franklin and Feldman (1977) provide an 
example, quite unexpected in view of our previous conclusions, of a fitness 
matrix for which, with certain values of R , there exist two stable equilibria, 
at one of which D — 0 while at the other D ^ 0. An example of a fitness 
matrix for which this occurs, written in the form (2.91), is 



0.78 

0.82 

0.77 



0.82 

0.79 

0.805 



0.77 

0.805 

0.795844. 



(6.53) 
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Here the equilibrium point c\ — 0.29989, C 2 = 0.143184, c 3 = 0.143184, 
C 4 = 0.683744, for which D — 0, is stable for all R greater than 0.05. At 
the same time there exists, for each 72, a second stable equilibrium for which 
D ^ 0. The location of this equilibrium depends on R. For R — 0.1 it is 
at ci = 0.446233, c 2 = 0.223923, c 3 = 0.223923, c 4 = 0.105920, and at this 
point D = —0.002877. On the other hand, when R = 0.5 the equilibrium 
point is at ci = 0.443863, c 2 = 0.222814, c 3 = 0.222814, c 4 = 0.110508, and 
at this point D = 0.000596. Even more unexpectedly, similar behavior can 
arise under multiplicative fitnesses (Karlin and Feldman, unpublished). 

Next we consider the question of the maximum number of stable internal 
equilibria. Karlin (1975) demonstrated that for small values of 72, no more 
than two such equilibria are possible. For larger R values no mathematical 
results are available, although the many simulations of Karlin and Carmelli 
(1975) and others, for arbitrary fitness matrices, suggest strongly that no 
more than two stable equilibria are possible for any value of R. 

The next point concerns the behavior of the equilibrium mean fitness w* 
as a function of R. In the special cases examined above, w* is nonincreas- 
ing with it!, and this accords with the verbal discussion of Fisher (1958). 
Unfortunately, this property does not always hold. While w * is locally (and 
indeed globally) maximized at R = 0, it is possible that for certain ranges 
of R values, re* increases as R increases. An example of a fitness scheme, 
in the notation (2.91), for which this occurs is provided by Karlin and 
Carmelli (1975): 



0.462245 0.403142 0.188776 

0.136754 0.481281 0.391682 (6.54) 

0.182915 0.245957 0.182463 

The equilibrium mean fitness, as a function of R , is shown in Table 6.2. In 

those cases where w* increases with R over certain R values, the behavior 
of w* as a function of R is often of the form displayed in this table: An 
initial decrease in w* is followed by an increase over a small range of R 
values with an ultimate flattening out of the values of ic* as R increases to 
0.5. If two stable equilibria exist for certain R values there will exist two 
curves of w* against 77, and it is possible that both such curves exhibit the 
form of behavior just described. 

The form of fitness scheme where this behavior tends to arise is typi- 
fied by the fitness values in (6.54). One double homozygote and the double 
heterozygote have larger fitnesses than the remaining genotypes, and the 
double heterozygote has the largest fitness. This ensures a stable equi- 
librium at R = 0 with two gametes only and, by continuity, for R small a 
stable equilibrium with two gametes predominating in frequency. For inter- 
mediate values of R all genotypes exist at positive frequency, and because 
of the lower fitness of most genotypes the equilibrium mean fitness here 
takes its minimum value. For large values of R there is again a fairly high 
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R 


Equilibrium mean fitness 


0 


0.463385 


0.005 


0.462887 


0.010 


0.462480 


0.015 


0.462168 


0.020 


0.461958 


0.027 


0.461845 


0.030 


0.461866 


0.035 


0.462003 


0.037 


0.462095 


> 0.042 


0.462245 



Table 6.2. Values of equilibrium mean fitness for the viability matrix (6.54), for 
various values of R 



mean fitness. The conditions required for this behavior are perhaps rather 
special and, at least in the numerical example given, the effect of R on w* 
is quite small. Nevertheless it is of some interest to note that in principle 
at least, this curious behavior can occur. 

We turn finally to the question of induced overdominance. In all cases 
considered above it can be shown that if a stable internal equilibrium exists, 
the marginal fitness at each locus must exhibit over dominance. Although 
no numerical counter-example for arbitrary fitness matrices has yet been 
found, no proof exists that this behavior applies generally. The converse of 
this proposition is false: There may exist an internal equilibrium exhibiting 
marginal overdominance which is unstable. Thus starting near this equilib- 
rium the gamete frequencies will change and this can produce behavior, if 
one of the two loci only and its marginal fitnesses are observed, contrary 
to that of single-locus theory. An example of this is given by Ewens and 
Thomson (1978). For the symmetric fitness matrix 
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written in the form (2.91), the equilibria (6.48) become 

c\ = 4 = 0.25 ± 0.25{1 - i?/0.0725} 1/2 , 

4 = 4 = 0.25 T 0.25(1 - i?/0.0725} 1/2 . 

These equilibria exist whenever R < 0.0725 and, from (6.49), they are 
stable whenever R < 0.05756. The marginal fitnesses at the B locus always 
exhibit overdominance, but those at the A locus do only if R < 0.05971. 
Thus whenever 0.05756 < R < 0.05971, both marginal fitnesses exhibit 
overdominance at the equilibrium and yet the equilibrium is not stable. 
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It will be clear that difficulties arise in enunciating general principles 
for equilibria of two-locus systems. While several conclusions, as discussed 
above, are generally true, counterexamples are usually possible. These 
sometimes refer to cases of unlikely biological interest, and it remains a chal- 
lenge to discover a principle that describes normal behavior in biologically 
realistic circumstances. 



6.5 Modifier Theory 

One of the most interesting applications of two- (or multi-) locus theory 
arises when one of the loci considered is a modifier locus, that is the genes 
present at that locus modify in some way the values of various genetic 
parameters (for example mutation rate, recombination fraction) at or be- 
tween other loci. These other loci are called the primary loci, and our main 
interest is to consider the way in which evolutionary processes at these loci 
are affected by the existence of the modifier locus. 

Two general classes of modifier locus theory may be defined. In the first 
class it is supposed that an individual’s fitness depends in part on his 
genetic constitution at the modifier locus. The modification process for the 
evolution of dominance, mentioned already and discussed in more detail 
below, falls in this class, as the fitnesses in (1.92) show. In the second 
class of modification schemes the fitness of any individual is assumed to be 
independent of his genetic constitution at the modifier locus. Any evolution 
at the modifier locus is then a result of its interaction with the primary loci 
and may then be described as being due to secondary selection. This class 
of modification schemes was introduced into the literature by Nei (1967), 
who considered a modifier controlling the recombination fraction between 
two primary loci. 

Our first example is of the former class and concerns Fisher’s ( 1928a, b, 
1930b, 1931, 1934) theory of the evolution of dominance through the action 
of a modifier locus. We have already proposed in (1.90) a fitness scheme for 
this situation. A proper description of the joint evolution at A and M loci 
requires consideration of the frequencies of the four gametes AiMi, A\M 2 , 
A 2 M\ and A 2 M 2 , but, if 5 is small and the recombination fraction between 
primary and modifier locus is not small, no serious error is made by making 
the approximation that linkage equilibrium obtains throughout the joint 
evolutionary process at the A and the M loci. Under this approximation the 
frequency of any gamete may be written as the product of the frequencies 
of the two constituent alleles. A more refined analysis (Feldman and Karlin 
(1971)) confirms that the error involved in making this approximation is 
negligible. 

Suppose that the frequency of A\ is close to unity and that A\ mutates to 
A 2 at rate u. Then under our simplifying assumptions, if x is the frequency 
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of A\ and y that of Mi, 

A y « sx(l — x)y( 1 — y){4fcy + 2 h — 2 hy — 2 fc}, (6.56) 

Ax « sx(l — x){l — x + (2x — 1)(1 — y)(2ky + h — hy)} — ux. (6.57) 

The final term in (6.57) does not of course come from (2.94) but arises 
from the recurrent mutation from A\ to A 2 . Clearly both the frequency 
x and the frequency y change over successive generations, and our aim is 
to use (6.56) and (6.57) to find an expression for the change A y that is 
independent of x. Before doing this we observe from (1.34), with a slight 
change of notation, that the equilibrium value of x when y — 0 is 1 — u/sh 
and when y = 1 is 1 — (u/s) 1 / 2 . Thus x(l — x) always lies in the interval 

((u/sh)( 1 — u/sh), (u/s) 1 / 2 }! — ( u/s )^ 2 }), (6.58) 

and is consequently always bounded above by (u/s) 1 / 2 . This implies that 
A y is always bounded above by 

(us) 1 / 2 y( 1 — y){Aky + 2 h — 2 hy — 2k}. (6.59) 

For k = \h this yields the upper bound 

Ay < {us) 1/2 hy( 1 - y), ( 6 . 60 ) 

while for k = 0 it yields the upper bound 

A y < 2h(us) l / 2 y(\ — y) 2 . (6.61) 

One way of obtaining a more accurate assessment of the value of A y 
would be to use (6.56) and (6.57) to form the differential equation 

dy = s(l - x);/(l - y){4ky + 2 ft - 2 hy - 2k} 

dx s(l — x){l — x + (2x — 1)(1 — y)(2ky + h — hy)} — u 

If this equation could be solved for x (as a function of y), the resulting 
solution could be inserted in (6.56) to obtain a very accurate value for Ay. 
Unfortunately no solution of (6.62) has yet been found, and the best that 
appears possible in this direction is to solve (6.62) numerically: This would, 
however, be equivalent to a joint numerical solution of (6.56) and (6.57). 

A slightly less accurate method is to argue as follows. For any fixed value 
of y there will exist an equilibrium value of x, somewhere in the interval 
(6.58). This value is found by solving the equation Ax = 0. Although we 
cannot expect x to have reached this equilibrium value for any current 
value of y, a reasonable approximation is obtained by assuming that it has. 
The solution of the equation Ax = 0, that is of the equation 

(1 - x) 2 + (1 - x)(2x - 1)(1 - y)(2ky + h - hy) = u/s , (6.63) 

is not as straightforward as might initially appear. Since x « 1 one might 
be tempted to ignore the term in (1 - x) 2 and, putting 2x — 1 « 1, to write 

l-x&(u/s)({l-y)(2ky + h-hy)) \ 
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Insertion of this in (6.56) gives, for the important particular ease k = /i/2, 

Ay « uy, (6.64) 

which is the value given by Fisher (1928b). Suppose we write A y = y( 1 — 
y)</>(y), and (with Fisher) call <j>(y) the “selective intensity in favor of the 
modifier”. Then (6.64) shows that 

(, Kv ) ~ u ( l - y)~ X (6-65) 

so that (j)(y) becomes indefinitely large as as y 1. This conclusion was 
stressed by Fisher (1928c) as an essential point of his argument, since it 
appears to imply a strong selective force on the modifying allele. 

The above analysis, however, is incorrect. This can be seen immediately 
by observing that it leads to a violation of the upper bound (6.60). For 
y ~ 1 the term (1 — x) 2 becomes the dominant factor on the left-hand side 
of (6.63), and it therefore should not be ignored when y « 1. It is, however, 
possible to replace the term 2x — 1 in (6.63) by 1 with little loss of accuracy, 
since the error involved in doing this is of order (1 — x) 2 (l — y) and is thus 
always extremely small. Solving the resulting equation gives 

1 - X = \ (-(1 - y)(2ky + h-hy) + {(1 - yf{2ky + h- hy ) 2 + 4 u/s} 1/2 ), 

and insertion of this value for 1 — x in (6.56) shows that to a very close 
approximation, 

Ay « \sy( 1 - y)(4ky + 2 h- 2 hy - 2k) x (-(1 - y)(2ky + h-hy) 

+ {(1 - y) 2 (2ky + h - hy) 2 + 4 u/s} 1 ^ 2 ). (6.66) 

For the special case k = h/2 this becomes 

Ay ss \shy{ 1 - y)(-h{\ - y) + {h 2 { 1 - y) 2 + 4 u/s} 1/2 ). (6.67) 

Since 

—h( 1 — y) + {h 2 ( 1 - y) 2 + 4 u/s} 1 ^ 2 < ( Au/s ) 1 ^ 2 , 

the approximation (6.67) gives 

Ay < (us) 1/2 hy( 1 - y), (6.68) 

in agreement with (6.60). Defining (j>(y) as above we get, to a close 
approximation, 

4>{y) < (us) l / 2 h. (6.69) 

Parallel calculations for the case k = 0 show that, to a close approximation, 

Ay < y{ 1 - y){lMh 1/2 u 3/4 s 1/4 } (6.70) 

for this case. Similar expressions arise for other values of k. Numerical calcu- 
lations, derived from the recurrence relations governing gamete frequencies, 
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show (Ewens (1967)) that the approximations (6.67)-(6.70) are very accu- 
rate, and confirm our belief that linkage equilibrium can be assumed to a 
close approximation throughout this process. 

What value do the above calculations have for the evolution of domi- 
nance question? It is clear that <j>{y) is always very small, and certainly 
does not become infinitely large, as Fisher’s analysis claimed. It is es- 
sentially because of this observation that Wright (1929a,b) originally cast 
doubt on Fisher’s theory. We have noted, in Chapter 1, Wright’s emphasis 
on the pleiotropic effects of genes. If, in line with his view, the modifier 
gene is subject to a primary selection pressure quite independent of its 
modification action, the very small selective pressure due to dominance 
modification cannot control its evolution and is essentially irrelevant. Fisher 
(1929) resists this viewpoint, but, quite apart from the bias in his argu- 
ment induced by the mathematical error noted above, this author finds his 
position unconvincing. 

The above analysis has assumed that from the start, the favored primary 
allele A\ is always at a high frequency and the small effect of dominance 
modification is in large part due to the very low frequency of heterozy- 
gotes A 1 A 2 throughout the process. In some cases, for example with the 
classic evolution of the melanic form in the moth Biston betularia due to 
industrial pollution, the eventually favored form starts at a low frequency. 
Thus during the course of its frequency increase many heterozygotes will 
appear upon which the force of dominance modification can act. The extent 
to which the frequency of the modifier is changed in this way depends on 
the degree of linkage between primary and modifying loci. If the two loci 
are closely linked there is some possibility for a substantial increase in the 
frequency of the modifier, and this tendency is magnified the larger the pri- 
mary locus selective differences are. At the same time, once the frequency 
of Ai reaches a high value the induced selective force on the modifier be- 
comes very weak and the argument of Wright outlined above will again 
prevail. 

We turn now to other ways in which modifier loci can act, considering in 
particular modification of linkage and mutation rate. In this way we wish 
to give an explanation for the evolution of these characteristics which is 
independent of arguments involving inter-population competition at least 
implicit in the early discussions of them. The classic papers introducing 
modifiers which do not change fitnesses were those of Nei (1967, 1969), who 
considered modification of linkage. We follow here, however, the discussion 
of this topic given by Feldman (1972). 

Consider two primary loci A and B and a modifier locus M, the effect 
of which is to influence the recombination fraction between A and B loci. 
Let these loci lie on a chromosome in the order MAB , with recombination 
values R between M and A loci, Rij between A and B loci and R + Rij ~ 
2 RRij between M and B loci. Here R^ depends on the genotype M{Mj at 
the modifier locus. 
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Our mode of approach is the following. Suppose the various genotypes at 
A and B loci have fitnesses specified by (2.90). If all individuals at the M 
locus are Mi Mi the recombination fraction between A and B loci is i?n, 
and we suppose that the population has reached a stable equilibrium point 
for this recombination fraction with gamete frequencies cj, and c\. 

Suppose next that the allele M 2 is now introduced at a low frequency at 
the M locus. The frequencies of the gamete M 1 A 1 B 1 , MiAii? 2 , M 1 A 2 B 1 
and MiA 2 B 2 then become c\ + Su c\ + <S 2 , C3 + <5 3 , c\ + 5 4 , and we write 
the frequencies of M 2 AiBi , M 2 AiB 2 , M 2 A 2 Bi and M 2 A 2 B 2 as 65, J 6 , 
$7, <5s. We now set up recurrence relations extending (2.94) for the eight 
gamete frequencies which, if terms in 8f, 5i8j are ignored, become linear 
recurrence relations in £5 , > ^7 and 5s . If all the eigenvalues of the matrix 

governing this recurrence system are less than unity, then 5i -T 0 and the 
system returns to its original equilibrium with M 2 absent. If at least one 
eigenvalue is greater than unity in absolute value the frequency of M 2 will 
increase, and the recombination fraction between A and B loci, for those 
individuals carrying the M 2 gene, will change. Clearly our objective is to 
find the circumstances, in terms of these eigenvalues, which lead to this 
increase. 

In general this proves to be rather difficult, although we present later a 
general argument that at least suggests what the nature of these eigenvalues 
is. In the additive, multiplicative and symmetric viability models, Feldman 
showed that provided c\cX the frequency of M 2 will increase if and 

only if R 12 < i?n, that is if and only if the modifier heterozygote MiM 2 
leads to tighter linkage between A and B loci than does Mi Mi. 

There is no reason to suppose that this conclusion does not apply for 
general fitness matrices. That this is so is suggested by a conclusion of 
Feldman and Krakauer (1976). Let the fitness matrix (2.90) at A and B 
loci be arbitrary and suppose, in the above notation, that R 12 < Rii, R 22 • 
Now define 



m i — (R22 ~ Ri2)/(Rn + R22 ~ 2 i?i 2 ) = 1 — m 2 , 

R* — m\Rn + 2mim 2 i?i 2 + 

Suppose now that (c \ , C3, c\) is a solution of the equilibrium equations 

(2.95) if R — R* . Then Feldman and Krakauer demonstrated that there is 
an equilibrium of the three-locus system at which 

freq(MiAiBi) = m^c*, freq(MiAiB 2 ) = m^, 

freq(M^ 2 5i) = freq(M^ 2 £ 2 ) = , 

for % — 1,2. The stability properties of this equilibrium have not yet been 
determined but, if this equilibrium is stable at least for certain recombi- 
nation values, it strongly suggests the evolution of tighter linkage between 
A and B loci by secondary selection. We note in passing the curious re- 
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semblance between the formula for mi and the equilibrium gene frequency 
(1.31) in a one-locus selective scheme. 

We turn now to modification of mutation rate. Suppose the fitnesses at 
the primary locus A are given by (1.25b) with s > sh > 0; cases with 
complete dominance in fitness can be considered similarly. Suppose that 
the mutation rate Ai — > A 2 is controlled by the genes present at a modifier 
locus and is Uij for individuals of genotype M\Mj. We may suppose that 
initially the frequency of M\ is unity and that the frequency of A\ is at the 
mutation-selection equilibrium value 1 — uu/s(l — h). The population mean 
fitness is now 1 — 2u\\. The allele M 2 is now introduced at a low frequency. 
By considering linearized recurrence relations for the four gametic types, 
it is found that the frequency of M 2 increases if and only if u \2 < wn- 
More generally the frequency of M 2 will steadily increase to unity if U 22 < 
U 12 < un, so that the mutation rate Ai — » A 2 becomes U 22 and the 
population mean fitness becomes 1 — 2 ii 22 • If ^12 < Wn, U 22 a polymorphism 
is established at the M locus. All these conclusions are true irrespective 
of the linkage arrangement between primary and modifier loci. Again, we 
attempt to restate these conclusions later as particular applications of a 
general modifier principle. 

Of course the question of the establishment of an optimal mutation rate 
requires arguments more complex than these, and must take into account 
the need for long-term flexibility perhaps enhanced by a higher mutation 
rate. These matters were discussed in Chapter 1, and our interest in them 
here is that inter-population selection arguments are not required to arrive 
at an agency reducing the mutation rate. 

It is also possible to discuss the dynamics of modifiers of sex-ratio, mi- 
gration and selfing (Feldman and Krakauer (1976), Karlin and McGregor 
(1974)). Instead of discussing these cases specifically, we turn instead to 
the question of whether there exists a general principle for modifier loci 
embracing the conclusions just reached as particular cases. Such a general 
principle has been proposed by Karlin and McGregor (1974). Although this 
principle does not have the status of a mathematical theorem, it neverthe- 
less applies widely for modifier loci. Consider a primary locus (or loci) with 
evolution determined at least in part by a parameter 0, for example a mu- 
tation rate. Suppose the value of 6 is determined by a selectively neutral 
modifier locus and that for individuals of genotype MiMj (i,j = 1,2) the 
parameter takes the value 6{j. Assume that random mating obtains, and 
let w(0ij) be the mean fitness of the primary system at a stable equilibrium 
when 9 = 9ij . Then if 

w(9 n ) < w(0\2) < ^(^ 22 ) (6.71) 

the allele M 2 will become fixed at the modifier locus. If the inequalities 
in (6.71) are reversed, M\ will become fixed, while if w{0\2) > w{0 n), 
w{922) a stable polymorphism will arise at the modifier locus. Thus this 
principle essentially asserts that the evolution at the modifier locus is such 
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as to maximize mean fitness, and this has been observed in the specific 
examples above. We have proved earlier that in the multiplicative and 
symmetric viability models the mean fitness is nonincreasing in R, and 
this principle then indicates that modifiers decreasing the recombination 
fraction become fixed. This is in agreement with the conclusion reached 
by Feldman (1972) described above, and generalizes that conclusion by 
not restricting attention to the frequency of the modifier M 2 when it is 
small. This principle suggests that there are circumstances where increased 
recombination between loci is favored. 



6.6 Two-Locus Diffusion Processes 

In this section we consider multidimensional diffusion analogues to the two- 
locus two-allele Markov chain models (3.130) and (3.131), using throughout 
this section the same notation as that used in Section 3.7. The evolution of 
the models described by (3.130) and (3.131) can be described by considering 
the linearly independent frequencies c\, c 2 and c 3 . However, there is an ob- 
vious asymmetry in doing this, and in any event we find it more convenient 
to work with the generation t frequencies x(t) = c\ + C 2 , y(t) = c\ + c 3 
and D(t) = C1C4 — C2C 3 . To use the diffusion theory of Section 5.1 we 
must assume that means, variances and covariances of the changes of these 
frequencies between consecutive generations are of order TV -1 . This will 
require us to assume in particular that R is 0(N~ l ). 

We suppose first that there is no mutation, and consider initially the 
RUZ model. Then from (3.139), 

E {D(t + 1) - D(t) I a(t)} = — (2AT) _1 {1 - 2NR}D(t). (6.72) 

If R is of order N~ 1 this gives (1 + 2NR)D as the drift coefficient corre- 
sponding to D , and it is found that the same value applies for the RUG 
model. Indeed it is found that all the coefficients for the diffusion process 
approximating the RUG model are the same as for the RUZ model, so that 
the diffusion approximations to the two processes are identical. 

Using (6.72) and the corresponding values for p and < 7 , as well as the vari- 
ance and covariance terms for these quantities, it is found for both models 
that if we scale time so that unit time T corresponds to N generations of 
the Markov chain, the backward Kolmogorov equation for the joint density 
of the frequency of Ai, the frequency of B\, and the linkage disequilibrium 
at time t, when there is no selection, and given the initial values p, q and 
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7 


0.5 


5 


10 


25 


^1 


-0.813 


-0.9926 


-0.9978 


-0.9996 


exp(Ai/50) 


0.984 


0.980 


0.980 


0.980 



Table 6.3. The largest solution (Ai) of (6.74) for various values of 7 



D for these variables, becomes (Ohta and Kimura, 1969a) 



d£ 

dt 



0 2 f 



d 2 f , 1 „ 



JPU " P) *3 + hd - <?)dr + + \D{ 1 - 2 p) 



dp 2 



<9<? 2 



dpdq 



d 2 f 
' dpdD 



d 2 f 



+ \D(l-2q)-^p + \{pq{l-p){l-q) 



+ D(l-2p)(l-2q)-D 2 }0 

-I(1 + 2VR)|£. (6.73) 



Here / is the joint density function of x(t), y(t) and D(t), given initial 
values p, <7 and D for these frequencies. Our aim is to use this equation, 
in conjunction with the theory of Section 4.10, to find diffusion analogues 
for various quantities established for the Markov chain models of Section 
3.9. We focus here on the diffusion process eigenvalues and the diffusion 
process expectation of {D(t)} 2 . These have been found by Ohta and Kimura 
(1969a), and we follow their analysis closely. 

Our point of departure for both problems is to consider the expectations 
of the three quantities x(t) (l — x(t)) x y(t) (l — y(t )) , D(t) (l — 2 x(t)) (l — 
2 y(t)), and D{t) 2 . Simultaneous equations for these expected values can be 
found by using (4.83). By inserting trial eigenvalue solutions with undeter- 
mined coefficients, Ohta and Kimura found that the expectations of these 
quantities converge to zero at rate exp(Ait), where Ai is negative and is 
the largest root of the equation 



4A 3 + (20 + 127) A 2 + (27 + 387 + 87 2 )A + (9 + 26 7 + 8 7 2 ) = 0. (6.74) 



In this equation 7 = NR and from our assumptions is 0(1). While an 
explicit solution of this cubic equation is possible, it is perhaps preferable to 
solve (6.74) numerically for selected values of 7, and some specific solutions 
are given in Table 6.3. 

It is naturally of interest to compare these values with those in Table 
3.1. Since unit time in the diffusion corresponds to N generations in the 
Markov chain, we should compare exp Ai t with fj, Nt , where /i is the Markov 
chain eigenvalue, or equivalently can be compared to the final line in Table 
3.1. It will be noted that the agreement is excellent, thus showing that the 
leading eigenvalue for the Markov chain is closely approximated by that 
for the diffusion process. 
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The value of E {D(t)} 2 turns out to be a complicated expression involving 
all three eigenvalue solutions of (6.74). We do not give an explicit expression 
here: It is sufficient to note that the value obtained is in excellent agree- 
ment with the Markov chain solution given in Table 3.1. It appears, with 
both conclusions, that the requirement that R be 0( 1 ) does not appear 
to be necessary for the agreement between Markov chain and diffusion so- 
lutions. This no doubt occurs because, in any expression where it occurs, R 
is invariably multiplied by the coefficient of linkage disequilibrium which, 
when R is not small, is usually small itself. 

Suppose now that mutation exists, so that there is a stationary distri- 
bution of gene frequencies. We now change notation so that x, y and D 
denote these stationary distribution frequencies. The expectation E (D 2 ) 
in this stationary distribution is of particular interest to us. We write the 
mutation rates as u\ from A\ to A 2 , v\ from A 2 to A\, 112 from B\ to B 2 
and V 2 from B 2 to Bi, and assume that all mutation rates are 
The drift and diffusion coefficients for the changes in x(t), y(t) and D(t) 
(now including terms in the mutation rates) can be inserted in (4.84) to 
find the stationary expectation of any function g(x, y , D) of these variables. 
The equation so obtained (Ohta and Kimura, 1969b) is 



E 



(^-*>g+^-^ +2D S; 



+ 2D(1 - 2x) 



d 2 g 

dxdD 



+ 2D(1 - 2 y)fyjQQ + {xy{ 1 - x)(l - y) 

+ D( 1 - 2x)(l - 2y) - D 2 }0 
+ 4N{vi - (ui + + 4N{v 2 - («2 + v 2 )y}^ 



-D(2 + 4NK)^)=0, 



(6.75) 



where K = R + u\ + U 2 + v\ + V 2 and the drift and diffusion coefficients 
implied by the mutation rates are displayed in the equation. The expecta- 
tion is with respect to the joint stationary distribution of x, y , and D. Our 
aim is to make suitable choices for g so that three simultaneous equations 
can be found from which E (D 2 ) can be obtained. Three choices of g which 
do this are 



Qi = xy(l -x)(l -y) (6.76) 

g 2 = D{ 1 - 2x){\ - 2 y) (6.77) 

gz = D 2 . (6.78) 

Inserting g = £3 from (6.78) into (6.75), we obtain 

E(<7 i + g2 — 53(3 4- 4:NK)^j = 0. (6.79) 
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The remaining two equations yield expectations for gi , #2 and in terms 
of expectations for “lower-order” quantities such as x(l — x) and xy. The 
expectations of these lower-order quantities may be found from (6.75) by 
choosing g = x(l — x), g = xy , and so on. The joint solution of the resulting 
equations and (6.79) gives 



E(L> 2 ) 



{2.5 + N(K + U)}NA 

(1 + 2NU)(3 + 4NK){2.5 + NU + NK) - 3 - 4NU 



(6.80) 



where 



U = u\ + U2 + v\ + V2 and 

A = SNmuyViVs , 1 1 w 6 . 81) 

( u\ + U2){vi + ^ 2 ) 4 N (u\ + v\) + 1 4iV ( 1 x 2 + V 2 ) + 1 

We may usually assume mutation rates are sufficiently small so that NU is 
moderate. Unless the two loci are very closely linked, NR and hence NK 
will both be large, and in this case we find that 

E(D 2 ) - A/{4R{1 + 2 NU)}. (6.82) 

This expression is of the same order of magnitude as the mutation rate, 
and will thus usually be very small. This reinforces the conclusion we 
have reached above, that random processes in finite populations are of 
minor importance in causing nonzero values of the coefficient of linkage 
disequilibrium. 

Some interest also attaches to the standardized linkage disequilibrium 
Gq, defined by 

°d = V(D 2 )/{Exy(l - x)(l - y)}. (6.83) 

The expectations in the denominator can be computed using (6.75)-(6.78), 
and we find that if NR is large, 

a 2 D « {4NR)~ l . (6.84) 

This is again small. If the two loci are tightly linked so that NR is not 
large, a more accurate expression (Ohta and Kimura (1969b)) is 

0*0 = [ 3 + 4 NK - 2(2.5 + NK + iVt/} -1 ]" 1 . (6.85) 



6.7 Associative Overdominance and Hitchhiking 

We consider in this section two concepts of potential practical interest 
which arise in finite populations with nonzero values of the coefficient of 
linkage disequilibrium. 

The first concept is that of associative overdominance. This was intro- 
duced by Frydenberg (1963) to explain secular gene frequency changes in 
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certain experimental Drosophila populations. The essence of the notion is 
that whereas the genes at the locus of interest may be selectively equiva- 
lent, they appear to exhibit overdominance by being linked (with nonzero 
values of linkage disequilibrium) to a locus where true overdominance does 
occur. The most detailed theoretical treatment of this concept is by Kimura 
and Ohta (1971a, pp. 110-116). 

Consider two loci A and B for which true overdominance occurs at the 
A locus while the B locus is selectively neutral, so that the fitnesses of the 
various genotypes are 

A\A\B — B — A 1 A 2 B — B — A 1 A 2 B — B — 

1 - si 1 1 - s 2 . 

We denote the frequencies A\ and B\ by x and y respectively. Then the 
frequencies of the gametes A\Bi and A 2 B 1 are xy + D and (1 — x)y — D , 
and thus the marginal fitness of B\B\ individuals (see (2.96)) is 

y~ 2 [{xy + D) 2 ( 1 - si) + 2 (xy + £>){(1 - x)y - D} 

+ (1 -s 2 ){(l - x)y-D } 2 ] 

= 1 — Si(x~ 1 D) 2 — s 2 (l — x — y~~ 1 D) 2 . (6.86) 

Similarly the marginal fitness of B 1 B 2 individuals is 

1 - Si(x - y~ 1 D){x - (1 - y)~ x D } 

- s 2 {l - x + (1 - y)~ 1 D}( 1 - x - y~ 1 D), (6.87) 

while that of B 2 B 2 individuals is 

1 - si{a: - (1 - y)~ x D} 2 - s 2 {l - x + (1 - y)~ x D} 2 . (6.88) 

The apparent selective advantage of B 1 B 2 over B\B\ is 

s\D{x + y~ x D){y{ 1 - y)}* 1 - s 2 D( 1 -x- y~ x D){y( 1 - t/)} -1 (6.89) 

and over B 2 B 2 is 

-SiD{x - (1 - y)~ x D}{y{ 1 - y)}~ x 

+ s 2 {l - x + (1 - y)~ x D}{y{ 1 - y)} -1 . (6.90) 

There is one case of these formulas of special interest. If the selection at 
the A locus is so strong that we may assume x = x* = S 2 /{s\ + 52 ), the 
apparent selective disadvantages become 

(si + S 2 )D 2 / {y 2 { 1 -y)} and (s, + s 2 )D 2 /{y{ 1 - y) 2 } (6.91) 

respectively. These are non-negative, so that in this case, if there is nonzero 
linkage disequilibrium between selected and neutral loci, apparent (or asso- 
ciative) overdominance will exist at the neutral locus. Clearly the extent of 
this effect will depend on the value of D 2 or, in the more general case where 
we cannot assume x = S 2 /{s\ + S 2 ), on D and D 2 . We now discuss how 
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large this effect might be when linkage disequilibrium is generated from 
stochastic processes in finite populations. 

The formula we have derived for E(D 2 ) in the previous section assumes 
no selective effects at either locus and must therefore be generalized to 
cover the present model. This has been done by Ohta and Kimura (1970) 
assuming that selection is so strong that x — x*. It is found that 



E (D 2 ) 



‘(l-z*)E{y(l-y)} 



1 + 4N(R + u + v) + 



(1— 2x*) 2 N (R-\-2u-\-2v) 
x*(l — x*) l-\-N(R+2u+2v ) 



(6.92) 



Here u and v are the mutation rates at the B locus. If R is not extremely 
small, this expression is 0(N~ 1 ) and hence is very small. We may thus ex- 
pect little effect of associative overdominance in this case. Similarly, when 
there is no mutation and fixation at both loci eventually occurs, the effect 
of linkage disequilibrium, while perhaps initially nontrivial, eventually be- 
comes negligibly small, so that associative overdominance is, in this case, 
a transient phenomenon. Extensions of these conclusions to the case where 
several overdominance loci are linked to the neutral locus are given by Ohta 
and Kimura (1970). 

We turn now to the concept of hitchhiking. Hitch-hiking occurs when the 
gene frequencies at a neutral locus are affected by those at a linked selected 
locus where a favorable allele is proceeding towards fixation. As the name 
implies, we are mainly interested in the extent to which the frequency of 
one allele at the neutral locus increases through linkage to the favored 
allele. Aspects of this possibility have been examined by Maynard Smith 
and Haigh (1974), Haigh and Maynard Smith (1976), Ohta and Kimura 
(1975, 1976) and Thomson (1977). 

Haigh and Maynard Smith consider a somewhat different question than 
do Ohta and Kimura. They assume an initial polymorphism at the neutral 
locus and, supposing that a substitution then occurs at the selected locus, 
focus attention on the expected final value of heterozygosity at the neutral 
locus when the substitution ceases. Ohta and Kimura, on the other hand, 
imagine a new mutation to arise at a neutral locus while a selected locus is 
substituting, and consider the effect on the expected total heterozygosity 
at the neutral locus of the selected substitution. Which consideration is the 
more relevant biologically is not clear, and both sets of authors argue for 
their own viewpoint. 

The purely mathematical discussion is less controversial, and we consider 
first the analysis of Ohta and Kimura. In order to have a standard of 
reference we consider the model (1.48), which concerns a selectively neutral 
locus without reference to any linked loci. If a single A\ mutant arises in 
an otherwise pure A 2 A 2 population, the number of A\ genes will be j on 
an average for 2j~ l generations (see (1.56)). This means, on average, that 
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the total heterozygosity created by this mutation is 

2N-1 

2j(2iV - j)(2N)~ 2 2j~ 1 = 2. (6.93) 

3 = 1 

We shall take this value as a standard against which values computed under 
hitchhiking may be compared. 

Suppose the A locus is selectively neutral while, at the B locus, the 
favored allele B\ is steadily replacing B 2 . We may assume, to a reasonable 
approximation, that this replacement is deterministic, so that the frequency 
y of Bi satisfies the differential equation 

^ = sy(l - y). (6.94) 

(For convenience, this assumes no dominance at the B locus.) We denote by 
X\ the frequency of A\ among B\ chromosomes and by X 2 the frequency 
of A\ among B 2 chromosomes. The total frequency of A\ is thus x = 
yx 1 + (1 — y)x 2 , and our aim is to compute the expected value of the 
function #, defined by 

00 

H — J 2x(l — x) dt. (6.95) 

0 

Ohta and Kimura approach this problem by using (4.83). Differential equa- 
tions for the expected value of x\, X\X 2 * ) and x\ are found from (4.83) by 
successively using these functions for #(•). These equations must be solved 
numerically and the solutions inserted in (6.95). These depend on the initial 
values of x\ and # 2 , the selective coefficient s, the recombination fraction 
R between A and B loci and the value y 0 assumed for y when the initial 
mutation at the A loci takes place. Ohta and Kimura (1975) give separate 
values for E (H) depending on whether the initial mutant lies on a B\ or 
B 2 chromosome. For our purposes it is probably convenient to consider the 
weighted average 



E (H) = yoEi(H) + (1 - y Q )E 2 {H), (6.96) 

where E i(H) is the expected value of H assuming the initial mutant is on 
a Bi chromosome. In Table 6.4 we give values of E (H) for various values 
of s and yo for the values R = 0.1, N — 100, computed from those of Ohta 
and Kimura (1975). It will be seen that E (H) does not differ substantially 
from the value 2, computed without taking linked loci into account, and 
from this point of view we may conclude, with Ohta and Kimura in this 
case, that hitchhiking is of comparatively small importance in altering the 
value of total mean heterozygosity at the neutral locus. Although we have 
considered only one value for N and one value for R in Table 6.4, the general 
conclusion reached applies for a very wide range of R and N values, and 
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yo 


0.1 


0.2 


0.5 


0.05 


1.97 


1.97 


1.97 


0.10 


1.94 


1.94 


1.96 



Table 6.4. Values of E (H) for various values of s and yo with R = 0.1, N = 100 



is appreciably in error only when NR < 5 and Ns < 100. The minimum 
expected value of H is about 1.2, and occurs when NR ~ 0, Ns ~ 5. 

Maynard Smith and Haigh considered it more relevant biologically to 
consider the effect on an existing neutral polymorphism of a selective sub- 
stitution and do so by comparing the expected final heterozygosity in this 
case with that existing before the selective substitution starts. They show 
that if R s 1} the ratio of the final heterozygosity (wTen the 
selective substitution has taken place) to the initial heterozygosity Ho is of 
the form 

const x Rs~ x , (6.97) 



where the constant depends on the initial gamete configuration. Under 
the assumptions made the quantity (6.97) is quite small. However, Ohta 
and Kimura (1975) computed H^/Ho for a far wider range of R and s 
values and conclude that unless Ns is small (< 100, approximately) then 
Hoo/Ho ~ 1. In particular this is true if R > s, and thus in this case the 
effect of hitchhiking is negligible. 

One of the theoretical results of Maynard Smith and Haigh (1974) can 
be used in a test for a hitchhiking event, and is discussed in more detail in 
Section 11.3.5. Consider a diploid population of N individuals and suppose 
that initially only A 2 A 2 individuals exist in the population. A single favor- 
able new A\ mutant now arises at this locus. The fitnesses of A\A\, A 1 A 2 
and A 2 A 2 are 1 + 2s, 1 + 2 and 1 respectively, with s > 0, and as a result 
A\ increases in frequency, under a deterministic analysis, from a frequency 
(2iV) -1 to a value close to 1. We call this event a selective sweep. Suppose 
that the recombination between this selected locus and a closely linked 
neutral locus is R and that at the neutral locus there are two alleles B\ 
and # 2 , with frequencies x and 1 — x at the start of the selective sweep. 
Then under a deterministic theory, if the new mutant A\ gene is on the 
same gamete as B\ before the selective sweep, the frequency of Hi will 
increase from x to the value 1 — c + cx at the conclusion of the selective 
sweep, where, to a reasonable approximation, 



R log 2 N 
s 



(6.98) 



If the new mutant A\ gene is on the same gamete as B 2 before the selective 
sweep, the frequency of B\ will decrease to cx at the conclusion of the 
selective sweep. 
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We return to this calculation in Section 11.3.5, and remark here only 
that the possibility of hitchhiking during a selective sweep is one reason 
why we should view much of the single-locus theory of population genetics, 
where loci are treated individually with no regard to events at linked loci, 
with some caution. 



6.8 The Evolutionary Advantage of Recombination 

We have noted above that the mean fitness of a random-mating popula- 
tion is maximized when the recombination fraction between the two loci 
considered is zero. (This conclusion may be generalized to cover an arbi- 
trary number of loci with an arbitrary number of alleles at each locus.) 
Why then have populations not evolved so that recombination does not 
exist? Even if we were to discount the use of mean fitness as a measure 
of success in intergroup competition and, further, assert that in any event 
recombination is determined more by the evolution of modifier genes than 
by such competition, we must still find an answer to this question, since 
in the analyses we have so far mentioned, such modifiers often act so as to 
reduce recombination. 

Our aim then is to find what advantage the existence of recombination 
might be for a population. In this context we shall mean sexual recom- 
bination: We do not consider the possible advantages of recombination in 
asexual populations. Thus, in what follows we assume the existence of two 
sexes with identical fitness patterns. A generalization of the theory of Sec- 
tion 2.3 shows that the frequency of any gamete will be the same in males 
and females, so that in the quantitative discussion of recombination no 
explicit recognition of the existence of the two sexes is necessary. 

The classical argument for the existence of sexual recombination is that 
of Fisher (1930a) and Muller (1932), namely that recombination favors 
the incorporation into the population of favorable new alleles arising at 
different loci, since recombination is more efficient in allowing such favored 
genes to occur in the same individual. A verbal discussion proceeds as 
follows. Suppose a favorable mutation Ai arises at a locus A and begins to 
spread throughout a population. If a favorable mutation B\ subsequently 
arises at a locus E?, then without recombination A\ and B\ cannot both 
become simultaneously fixed unless the initial B\ mutation happens to arise 
on an A\ chromosome. This is unlikely to occur until the frequency of A\ 
is substantial, and thus either the evolution at other loci is slowed down by 
the evolution of the A locus or the favorable mutation A\ is lost through 
the increase in frequency of B\ at the B locus, and hence of the linked 
allele A 2 . With recombination, both A\ and B\ genes can eventually arise 
on the same chromosome so that evolution, under this argument, proceeds 
more rapidly than with no recombination. 
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This argument clearly assumes that ultimately the advantage to the 
population with recombination will arise through intergroup competition. 
(Recall that Fisher’s original argument for decreased recombination also 
makes this assumption.) It would be fitting, in line with our discussion 
of the evolution of modifier genes, to attempt to produce an argument 
that does not rely ultimately on such an assumption: We mention such 
arguments later. We emphasize again that our aim is to compare sys- 
tems with no recombination (R = 0) to those with positive recombination 
(R > 0): This is a different question to comparing two populations with 
positive recombination rates i?i, R 2 respectively. It may well be that, since 
high recombination breaks up favorable gene complexes as well as creating 
them, the incorporation of favorable new mutants proceeds best, at least in 
some circumstances, in populations with low but positive recombination. 
To repeat, this is not the comparison that is being made. 

The first attempt to quantify the Fisher-Muller theory was by Crow and 
Kimura (1965). Crow and Kimura assume a population in which favor- 
able new mutations arise in a population of size at total rate NU per 
generation. The new favorable mutations are all at different loci, and the 
mutation is nonrecurrent. We may suppose for convenience that each new 
mutant has selective advantages with no dominance. While (see Section 
1.4) most favorable new mutations will be lost from the population, we 
may expect an equal fraction to be lost with and without recombination, 
so that this random loss may safely be ignored and all processes treated 
as deterministic. We assume finally finally that in a population without 
recombination, on average g generations pass between the occurrence of 
a favorable new mutation and the occurrence of a second favorable mu- 
tation in a descendant of the first. Then in such a population favorable 
new mutation are incorporated into the population at a rate of one per g 
generations. 

In a population with recombination, all favorable new mutations during g 
generations can be incorporated, and hence since NU favorable mutations 
arise per generation, NU g arise during g generations. As far as the rate of 
incorporation of favorable new mutations is concerned, then, the advantage 
of recombination is NUg : 1, and in order to discuss this ratio more usefully, 
it is necessary to find a formula for g in terms of TV, U and s. 

We have assumed no dominance and a selective advantage s to single 
mutants. The frequency x of individuals carrying a favorable new mutant 
is then given by (1.27) if we put h — \ and replace s by 2s. Under the 
initial condition x = A -1 , the solution of (1.27) is clearly 

x= (l + (N — l)e~ st ) 



(6.99) 
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Thus, in the first i generations after the occurrence of this mutation the 
total number of its descendants is 

i 

N Jxdt = Ns- 1 log {(N - 1 + e si )/N}. (6.100) 

0 

The total number of favorable new mutations in these descendants is found 
by multiplying this quantity by {/, and thus g is found as the solution of 
the equation 

1 = NUs~ l log{(N - 1 + e S9 )/N}. 

This gives immediately 

9 = s- 1 log {N(e s / NU - 1) + 1}, 

and thus the rate of incorporation of advantageous mutations in popula- 
tions with recombination to those without becomes, under this analysis, 

NUs- 1 log {N(e s/Nu - 1) + 1} : 1. (6.101) 

Several limiting cases of this formula are of interest. Suppose U is ex- 
tremely small, so that favorable new mutations arise very rarely. We may 
then expect that each favorable mutation is incorporated in the popu- 
lation before the next arises, and in this case there is no advantage to 
recombination. This argument is confirmed by noting that the ratio (6.101) 
approaches unity as U — > 0. Similarly for very large s the incorporation of 
each new favorable mutant should be very rapid, and we again confirm that 
the ratio (6.101) approaches unity as s -» oo. Clearly the situation when re- 
combination is most favored is when U/s is large and N is large. Crow and 
Kimura (1965) give a table of values of (6.101) for various combinations 
of N, U and s values which document this. 

This conclusion was challenged by Maynard Smith (1968), who produced 
a “counter-example” in which the existence of recombination made no dif- 
ference to the rate at which two favorable alleles increased in frequency. 
Maynard Smith considered unfavorable alleles at two loci which are main- 
tained at low frequency in a population through recurrent mutation and 
showed that the gamete frequencies would then be in linkage equilibrium. 
Suppose now that the environment alters and that both rare alleles are fa- 
vored, with a multiplicative fitness scheme of the form (6.28), and steadily 
increase in frequency. We have stated earlier that with a multiplicative fit- 
ness scheme a population having zero linkage disequilibrium initially will 
persist in a state of zero linkage disequilibrium. In this case, the value of the 
recombination fraction R is irrelevant to the rate of increase of frequency 
of the two alleles if there is no linkage disequilibrium, since R appears 
only as a multiplier of the coefficient of linkage disequilibrium. Hence the 
rate of incorporation of the rare alleles is unaffected by the existence of 
recombination. 
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As pointed out by Maynard Smith (1968) and Crow and Kimura (1969), 
there is an essential difference between the assumptions made in the two 
analyses. Maynard Smith’s analysis assumes that gametes carrying both 
initially unfavored alleles exist at positive frequency. Crow and Kimura’s 
analysis assumes zero initial frequency for such gametes and, as is clear 
from (2.94) with R = 0 or by general reasoning, such gametes can never 
arise in a population without recombination if their initial frequency is 
zero. More generally, Maynard Smith’s claim is that the essential difference 
between the two analyses arises because in his analysis for favorable new 
alleles arise in a recurrent process, whereas in that of Crow and Kimura 
they arise uniquely. Clearly if favorable new alleles do arise recurrently at 
a sufficiently high frequency, then even without recombination a favored 
new mutation at the B locus can arise on a chromosome carrying a favored 
mutation at the A locus in the course of fixation. 

Crow and Kimura’s (1969) claim is that their argument does not as- 
sume unique favorable mutants, but rather that these occur sufficiently 
rarely so that double mutants arise very seldom (or, more generally, if 
many loci are substituting simultaneously, that n-tuple mutants arise with 
completely negligible frequency). Thus the real essence of their argument 
remains unchanged. Maynard Smith (1971) carried out an analysis drop- 
ping the assumption of zero initial linkage disequilibrium but incorporating 
a recurrent mutation rate of favorable mutations, and concluded that for 
moderate populations ( N ~ 10 6 ) there is little advantage to recombination, 
while for large populations ( N ~ 10 10 ) populations with recombination can 
incorporate favorable new mutations about four or five times faster than 
populations without recombination. 

It is clear that the final answers to these questions rely on biological 
arguments concerning the most likely circumstances from which a micro- 
evolutionary process begins and, more generally, on the main nature of 
evolution. If evolution is mainly of the “shifting balance” type of Wright, 
outlined in Chapter 1, where gene frequencies are high throughout, the 
argument of Crow and Kimura argument does not apply. If, however, evo- 
lution depends more on the incorporation of rare favorable mutations, their 
argument is much more important. The extent to which this is so will de- 
pend on the rate of occurrence of favorable new mutations, the population 
size, and the selective advantage of the new mutant. 

A number of interesting quantitative conclusions have been found when 
mutation to favorable new alleles is recurrent. Thus when the double 
mutant is more fit than multiplicative fitnesses would imply, Eshel and 
Feldman (1970) demonstrate that with no recombination, the frequency of 
the double mutant gamete is always larger than it is with positive recombi- 
nation, provided that the initial disequilibrium C1C4 — C2C3 is non- negative, 
where c\ is the initial frequency of the double mutant gamete. They fur- 
ther show that when single mutants are deleterious but the double mutant 
advantageous, for suitable fitness values and sufficiently low mutation rate 
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the two mutants will increase in frequency only if the linkage between the 
two loci is sufficiently tight. This conclusion was essentially given also by 
Crow and Kimura (1965). Karlin (1973) considered stochastic versions of 
the process, paying particular attention to the mean time until the first 
double mutant gamete is formed and the mean time until fixation of the 
double mutant gamete. 

All the above arguments concern long-term optimization and ultimately 
rely on intergroup competition for the establishment of the population 
with the optimal recombination value. Short-term arguments have been 
offered by Williams (1966, 1975) and Williams and Mitton (1973). These 
center around the claim that under intense selection, populations having 
high recombination have an immediate advantage over populations with 
no recombination because they produce more high-fitness genotypes, which 
are assumed to be the only genotype to survive. This argument has been 
contested by Maynard Smith (1971). Felsenstein and Yokoyama (1976) in- 
troduce a locus which modifies recombination between primary loci and 
discuss verbally and by simulation the fate of the allele causing no recom- 
bination as favorable mutations arise and become fixed at the primary loci. 
This argument is in the spirit of Section 6.5 and avoids group-competition 
arguments. Unfortunately the complexities of the argument make a mathe- 
matical analysis well-nigh impossible. The argument relies on the existence 
of randomly generated linkage disequilibrium in finite populations, and thus 
the analysis of Section 6.6, which suggests that unless population sizes are 
small such linkage disequilibrium is rarely large, becomes relevant. 



6.9 Summary 

In the preceding sections we have outlined several conclusions concerning 
the joint evolution at two linked loci. A number of topics have not been 
discussed. These include the effect of population subdivision (Feldman and 
Christiansen (1975), Nei and Li (1973)), the effect of allowing several alleles 
at one locus (Feldman et al. (1975)) and the effect of different recombina- 
tion fractions in the two sexes (Strobeck (1974)). We cannot hope to cover 
here all possible extensions and generalizations. Far and away the most im- 
portant question concerns the degree of linkage disequilibrium in natural 
populations. We have observed, and will note again in the next chapter, 
that if linkage disequilibrium can normally be taken as negligibly small, 
a great simplification can be made to the theory. Loci can essentially be 
examined one by one, with interactive effects between loci being of minor 
importance. Ohta and Kimura (1975), for example, claimed that linkage 
disequilibrium in nature is comparatively rare and that such simplifying 
assumptions, which allow us to carry the theory a considerable distance, 
can reasonably be made. Thus they assert that “for large and stable pop- 
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ulations the concept of quasi-linkage equilibrium together with the single 
locus theory is sufficient to treat most problems realistically”. Lewontin 
(1974), on the other hand, emphasized the role of linkage and linkage dis- 
equilibrium in evolution, and Wright, as we have noted, also emphasized 
interactive effects of loci. Under the latter view evolution is far more com- 
plex than would appear under the “single-locus reductionist” approach, 
and its quantitative assessment is, as a result, extremely difficult. It can- 
not be claimed, even now, after thirty years, that a weight of evidence has 
yet accumulated on either side. In the next chapter we carry the theory to 
many loci and note the additional complications, compared to those in a 
two-locus analysis, that then arise. 




7 

Many Loci 



7.1 Introduction 

Our aim in this chapter is to outline certain properties of populations when 
it is assumed that the various characteristics, and in particular the fitness of 
any individual, depend on his genetic constitution at all loci in the genome. 
Although we often assume random mating and/or particular forms for var- 
ious parameter values, since the analysis can be carried further when these 
assumptions are made, we also consider cases where no such assumptions 
are made. In particular we shall prove the fully general version of the Funda- 
mental Theorem of Natural Selection, where no assumption is made about 
the mating scheme, random or otherwise, about the fitness values, the num- 
ber of loci that fitness depends on or the number of possible alleles at each 
locus. Thus we consider here both general and specific cases, and base our 
overall conclusions on the results deriving from both. 

This chapter has two intertwining themes. The first concerns the rela- 
tionship between properties of the entire multilocus system and those of the 
various subsystems, in particular single-locus subsystems, that the entire 
system defines. The second concerns linkage disequilibrium and its effect 
on static and dynamic properties of the population. The former theme is 
of interest because, while many properties of individuals depend on genes 
at a large number of loci, experiments often involve one or a small num- 
ber of loci, and it is clearly important to assess the extent to which valid 
inferences can be made from the loci investigated to the entire system in- 
volved. We shall find that to a great extent, the validity of these inferences 
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depends on the amount of linkage disequilibrium. For highly structured 
systems, that is those possessing high linkage disequilibrium values, these 
inferences can be of dubious value. For unstructured systems, with little or 
no disequilibrium, the inferences are more likely to be valid. 

A large literature on multilocus theory is now available. We do not aim 
to cover this here, focussing on those aspects of the theory relating to the 
themes discussed above. 



7.2 Notation 

Since the notation for multilocus systems can become remarkably confus- 
ing, we collect here most of the notation that will be used in this chapter: 
This notation does not necessarily conform to that used in other chapters. 

We suppose the entire genetic system, or genome, to consist of K loci, the 
generic symbol used for a locus being k (and on occasions where two loci 
are considered simultaneously, k and £). Thus k takes the values 1, 2, . . . , K\ 
As far as possible we use upper-case symbols for fixed quantities such as 
the number of loci in a system, and the corresponding lower-case symbol 
for generic or typical values. We suppose there are Ik alleles possible at 
locus A, so that there will exist I = n h different A-locus gametes, which 
are assumed labeled in some agreed fashion 1,2 ,...,i, ...,/. 

We are also interested in one-, two-, and in general if -locus subsys- 
tems of the entire iF-locus system. The alleles at locus k are labeled Ak i, 
Ak 2 , . . . , Aki k . Two-locus gametes are described by the alleles at the two 
loci at which they are defined , for example (A*, u , Ae v ). For general H- locus 
systems there will exist Q — \\Ik if-locus gametes, where the product is 
taken over all H loci in the subsystem. These are also assumed to be labeled 
in some agreed fashion 1,2 , . . . , g, . . . , Q. 

We now turn to the frequencies of the various alleles and gametes, and 
will consistently use the notation x for gene frequencies, y for frequencies of 
two-locus gametes, z for frequencies of H- locus gametes and c for frequen- 
cies of AT-locus gametes. Finally we consider all G genotypes in the entire 
X-locus system, and assume that these are listed in some agreed order. 
The frequency of genotype s in this listing is denoted g s . More explicitly 
we have 



Xk u = frequency of the allele A^u, 

Uku,£v — frequency of the two-locus gamete (A/c U , At v ), 
z q = frequency of the qth G-locus gamete, 
d = frequency of the iih AT-locus gamete, 
g s = frequency of the sth genotype in the genome. (7.1) 

When referring to two or more tf-locus gametes we use suffixes p, q and 
r; when referring to two or more AT- locus gametes we use suffixes A, i and 
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j. Clearly the frequencies Xk u , Vku,iv and z q can be found by appropriate 
summation. Thus, for example, 



Z q — Q, (7-2) 

i£S q 

where S q is the set of all iGlocus gametes containing the same alleles at 
the loci of the H - locus system as the gth iTdocus gamete. 



7.3 The Random Mating Case 

7.3.1 Linkage Disequilibrium, Means and Variances 

Our initial analysis assumes random mating. This implies a focus 
on gametes and their frequencies, as well as on measures of linkage 
disequilibrium. 

The concept of linkage disequilibrium was introduced in Chapter 2 
for two-locus, two-allele systems. We use the symbol D for two- locus 
disequilibria: Thus 



Dku,iv — Vku,iv %ku%£v (7.3) 

Higher-order linkage disequilibria have been defined by Geiringer (1944), 
Bennett (1954) and Slatkin (1972). Although linkage disequilibrium is a 
major concern of this chapter, we shall not introduce these measures here. 
We note, however, that if the frequency of every iC-locus gamete is the 
product of the frequencies of its constituent alleles, all these measures are 
zero, and that large linkage disequilibrium values imply that gametic fre- 
quencies cannot be found, even approximately, by forming the products of 
the corresponding allele frequencies. 

For the random-mating case it is convenient to think of the genotype 
as being made up of two gametes, one derived maternally and one pater- 
nally. Each pair of iT-locus gametes then defines some genotype: For this 
random-mating analysis we denote the genotype defined by gametes i and 
j as genotype (i, j). The value of some measured characteristic of an (i,j) 
individual is written : It is assumed that there is no environmentally 
caused variation in this measurement. A particular case is the fitness of an 
(i, j) individual, given the special notation Wij . The marginal value ra* of 
gamete i for the character in question is defined by 

nii = E c j m ij, (7.4) 

j 

and the mean value m for the entire population is given by 



m = ^ Cirri, = ^ CiCjrriij. 



(7.5) 
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The total genetic variance a\ for the character is 



o\ = CiCj(mij - ra) 2 , 



(7.6) 



and the additive component of this, defined in Section 7.3.3, is denoted a A . 

Similarly, each pair of iT-locus gametes defines an if -locus genotype, the 
(marginal) value fh pq for an individual (p,q) being defined by averaging 
over if -locus genotypes. Explicitly, 



Yjes p Yjes q c i c j m ij 



The marginal value for gamete p is 



m p = z q m pq = 



YieS p Yj CiCjiriij 



and it follows that 



C-j Cj THij — 771 . 



* 3 



(7.7) 



(7.8) 



(7.9) 



The total variance in the character for the ii-locus subsystem is 



^ ^ ^ > Z p Z q( rn pq m ) > 



(7.10) 



and this can also be divided into additive and non-additive components. 
This is true in particular when H = 1, for which the z p are the gene 
frequencies Xku Xk2i--- a t a single locus. Since the single-locus case is 
particularly important, we now exhibit the variance partition for it. The 
marginal value for the genotype Ak u Ak v is defined (as in (7.7)) by 



mku,kv — 



Hi€S ku J2jeS kv C i C j m rj 



'^ku'^kv 

From this we may compute the marginal value of Ak u as 

Wlku — ^ ^ '^ j ku,kv'^'kv 

and the marginal average excess of A^ u by 



&ku — TT^ku TYl. 

Then the locus k marginal variance for the character, namely 

^ ^ ^ ^ %ku%kv{mku,kv ^) 7 

U V 



(7.11) 



(7.12) 



(7.13) 



(7.14) 
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may be partitioned into an additive component a\(k) and a dominance 
component <J%{k) defined respectively by 

= 2 J2 Xkua k^ ( 7 - 15 ) 

u 

°D ( k ) = Y3F X kuXkvdl' UV (7. 16) 

U V 

where 



d k t uv — TU'ku,kv nifc u TYlkv T Cfl. if -17) 

For H > 1, additive x additive and other higher-order components of 
variance arise: These are of interest only in Section 7.5, and we defer further 
discussion of them until that section. 

7.3.2 Recurrence Relations for Gametic Frequencies 

The genetic evolution of a random-mating population is described most 
efficiently by recurrence relations for the various K - locus gamete frequen- 
cies. These relations generalize those in (2.94), and depend on genotypic 
fitnesses and the recombination pattern between loci. We define the latter 
through the function /(i, j -> h), defined as the probability that a randomly 
chosen one of the two gametes formed by recombination in gametes % and 
j is gamete h. Then if c' are the frequencies of gamete i in consecutive 
generations, 

h)c- = WiCi - } WijCiCjf(iJ h) + ) ru h jC h Cjf(h, j -> i). (7.18) 

Here denotes a summation over all gametes h and j with i^j^h and 
Y^ denotes summation over all gametes h and j with h, j / i. Summation 
of (7.18) over all i in S p yields 

WZ V = W p Z p ~Y^ ^ WpqZ P Z qf(P,q r ) + ^ ^rq z r z q f{r, q p) (7.19) 

where an d Y^ have meanings parallel to those just given. The simi- 
larity in form between (7.18) and (7.19) shows that the recurrence relations 
for H - locus gametes written down by formal analogy with (7.18) do indeed 
provide the correct recurrence values, but with one restriction: The fitnesses 
w pq (unlike the full multilocus values Wij) are not fixed but normally change 
from generation to generation. It follows that the recurrence relation (7.19) 
can be used to predict /7-locus gametic frequencies only one generation in 
advance. For long-term predictions the full system (7.18) must be used. 
We have noted in Chapter 2 that for systems where fitness depends on 
the alleles at two loci, decreases in mean fitness can occur and the mean 
fitness increase theorem is not valid. If, however, the loci involved are in 
linkage equilibrium, a decrease in mean fitness cannot occur. This is shown 
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by setting C1C4 - C2C3 = 0 in (2.94). Doing this yields the “single-locus 
type” (2.7), and this implies non-decrease in mean fitness, at least for one 
generation. The same holds for many loci: If all linkage disequilibria of all 
orders are zero, (7.18) will also reduce to equations in single locus form, 
and the same conclusion holds. We emphasize that this is a condition on 
gamete frequencies, not on the fitnesses Wij. 

We turn now to equilibrium behavior. The equation c[ — c\ implies that 
z q = z q , a conclusion that is put more usefully in contrapositive form: 
If z f q V z q , the full iTdocus system cannot be in equilibrium. This is a 
conclusion concerning the full system derived from a subset of loci. In 
particular if H = 1 and only two alleles Aki and A ^2 can occur at locus 
&, the full system cannot be in equilibrium unless the marginal fitness of 
the heterozygote AkiAk 2 is outside the range of those of the homozygotes 
Ak\Ak\ and Ak2^k2 and unless further the frequency of Aki is at the value 
predicted by an equation of the form of (1.31). 

Despite the above results, the equations z q = z q (q = 1, 2 , . . . , Q) do not 
necessarily imply that the full jRMocus system is in equilibrium. Further, 
if indeed c- = c*, so that the full JT-locus system in at an equilibrium 
point, the stability of this equilibrium cannot necessarily be gauged by the 
values of w pq . It is possible for the values of the w pq to suggest stability of 
the K-locus equilibrium and yet for that equilibrium to be unstable. For 
examples see Ewens and Thomson (1977): Clearly subsystem behavior does 
not necessarily give correct information about the full system. 



7.3.3 Components of Variance 

In this section we compute the full iiT-locus “additive genetic” and the 
“additive gametic” components of the variance (7.6), continuing to assume 
random mating. We show that the two are identical and then consider 
their relations to the sum of the single locus additive genetic variances. 
The additive genetic variance was defined for one locus (and two alleles) in 
(1.9) and for two loci in (2.102), while the additive gametic variance was 
defined after (2.106). We consider first the additive gametic variance. For 
an arbitrary number of loci, the total gametic variance for any character is 
defined as 



2 Ci(m,i - m) 2 , (7.20) 

and is a measure of the differences in marginal gametic values for this 
measurement. The additive gametic variance measures the extent to which 
this variance can be accounted for by additive effects of genes. We attach 
an additive parameter a^ u to the allele Aku, where since many loci are 
now involved the ctk u are subject to the constraints (generalizing those in 
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(2.101)) 

^ ^ &ku%ku 0 
u 

for each k . Subject to the constraints in (7.21) we now minimize 

2 - m - y^Qfcu) 2 , 



(7.21) 



(7.22) 



with respect to the ak u , the inner summation being over all alleles in gamete 
i. The minimizing value &k u of the a^ u satisfy the equation 



'Eku^ku T 



M yku,tv6itv = ) Ci ^ mi ~ m )> 

£,v 

£^k 



(7.23) 



where implies summation over all gametes containing the allele 

Except in degenerate cases, (7.23) is a system of linear equations for the 
and thus has a unique solution. Standard least-squares theory now 
shows that the additive gametic variance is 

( ku ) 

2 £(E Ci(rrii - mi)a ku ) . (7.24) 

k,u 



Similarly the additive genetic variance is found by attempting to ac- 
count for genotypic values as far as possible by the additive effects of 
alleles. Subject to the constraints in (7.21), this is done by minimizing 
the expression 

2 ££ CiCj ( rriij — m — £ a ku ) 2 , (7.25) 

i 3 

with respect to the the inner summation now being over all alleles in 
the genotype If Ag v occurs twice in any genotype, the contribution 

a£ V is counted twice in the summation. The minimizing values a^u can 
again be computed, and it is found (Ewens and Thomson (1977)) that 
these also satisfy (7.23) and that the sum of squares removed reduces to 
the expression in (7.24). Thus the additive genetic and additive gametic 
variances are equal. To find either we thus compute whichever is easier in 
any given circumstance and note that the identity of the two reinforces 
the view, given by Crow and Kimura (1970), that the expression “genic 
variance” should be used for both. 

It is not, in general, easy to determine the difference between the true 
additive genetic variance, given by (7.24), and the sum a\(k ) of the 

k 

single-locus values, where cr\(k) is defined in (7.15). In general the two 
values are not equal. If, however, all possible two locus coefficients of linkage 
disequilibrium are zero, so that 



Vku,iv '^ku^iv 



( 7 . 26 ) 
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for all k, £, u , v , then equality does hold. This can be seen immediately 
from (7.23). When (7.26) holds, the second term on the left-hand side of 

(7.23) is zero, because of the constraints in (7.21), so that = a,k u and 

(7.24) reduces immediately to ^2cr\(k). More generally, Avery and Hill 
(1978) show that whether (7.26) holds or not, if only two alleles are possible 
at each locus and the measurements rriij can be expressed as the sum of 
single-locus genotypic contributions, the true additive genetic variance is 

°a = 52 vaW + 4 XX( afei - afc 2 ) (an ~ a>£2 )^M> (7.27) 

k k<£ 

where V)i(fc) = X] x fcu a iL, w ^h a /- u being the true multilocus average effect 
of Aku and D k,z is the coefficient of linkage disequilibrium between loci k 
and £. We check that if D k,e — 0 for all k and £, the true additive genetic 
variance and the sum of the single-locus values are identical. 

The terms in the second sum in (7.27) can be both positive and negative, 
and thus considerable cancellation is possible in the summation. However, 
this might not occur in highly structured genetic situations, when the sec- 
ond term can dominate the first term. It is therefore of some interest to 
assess the circumstances under which each of these cases occurs. Impor- 
tant progress was made on this point by Bulmer (1976). Bulmer simulated 
a genetic system in which the value of a given character is determined by 
the alleles at twelve loci as well as an independent environmental com- 
ponent, the contributions being additive across loci with AkiA^i, Ak\Ak 2 
and A^Ak 2 contributing 0, 1 and 2 respectively to the measurement. The 
genetic contribution to the character thus ranges from 0 to 24, and there 
are no dominance contributions to the total genetic variance. 

We are interested here in the effects of stabilizing, disruptive and direc- 
tional selection schemes on the additive variance o\ and its two components 
as given in (7.27). If Xki and Xk 2 are both moderate, o 2 A {k) was found by 
Bulmer to be approximately 0.4 or 0.5, so that X^°a(^) * s approximately 5 
or 6. Under both stabilizing and symmetrical disruptive selection ^2cr\(k) 
takes values of this order of magnitude, and thus provides a useful standard 
against which the value of the second term on the right-hand side of (7.27) 
can be compared. Under stabilizing selection this term is approximately 
— 1.5 or —2.0. Thus linkage disequilibrium lowers the true additive genetic 
variance somewhat from the value calculated without disequilibrium. How- 
ever, under disruptive selection this term is extremely large, being ten or 
eleven times larger than J2cr A (k) itself. It is clear why this is so. Under 
strong disruptive selection, two gametes, one consisting entirely of Ak 1 
genes, (fc = 1, 2, . . . , 12) the other, of A &2 genes (k = 1, 2, . . . , 12), occur in 
high frequency, and the genetic system acts very much like a single-locus 
system with two alleles having values 0, 12 and 24 for the three genotypes. 
For such a system the additive genetic variance is 72 if the two alleles are 
of equal frequency, and the contribution to variance of the linkage disequi- 
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librium terms makes up most of the difference between this value and that 

given by £<4(fc). 

The relation between a\ and ^ er^(fc) under directional selection is not 
so easily arrived at intuitively. In Buhner’s simulations i s approx- 

imately 25% less than o\ during the rather small number of generations 
before fixation of the favored allele at each locus. 



7.3.4 Particular Models 

It is clear from the above that the degree of linkage disequilibrium in a 
genetic system influences considerably the extent to which multilocus prop- 
erties of the system can be determined from a consideration of single-locus 
properties. It is thus important to assess the extent to which linkage dis- 
equilibria will arise in natural populations, particular at equilibrium, and 
to make a partial assessment of this we now consider equilibrium prop- 
erties for certain special fitness models. We continue to assume random 
mating and the only character we consider is fitness. Since the models 
considered often have special properties, for example of symmetry, some 
caution is necessary in drawing inferences from them: We gain some gen- 
erality by considering four different models, noting in particular when the 
same inference is suggested by all four. 

Three of these models require the definition of a “fitness contribution” 
from each locus given, for the genotype Ak u Ak v at locus fc, by 

Wk(u,v). (7.28) 

We assume the Wk{u, v) are such that for a single-locus system with fitness 
parameters (7.28) there exists a unique internal stable equilibrium point 
with the frequency of A^ u being Xk u • 

Assume now the Wk(u, v) are used in some way to define fitnesses for the 
iT-locus genotypes. Any equilibrium point of such a system for which the 
frequency of the gamete {Ai uU A 2u 2 , A 3u3 , . . .) is x lu ix 2u2 x 3u3 • ■ ■ is called 
a “product” equilibrium: We shall be particularly interested in stability 
conditions for equilibria of this type. All coefficients of linkage disequilib- 
rium, of all orders, are zero at such an equilibrium. Roux (1974) and Karlin 
(1977a) called these “Hardy- Weinberg” equilibria, but we prefer here the 
term “product” to avoid confusion with the slightly different single- locus 
meaning of the term “Hardy- Weinberg frequencies” . 

Suppose first that in the K -\ ocus system, the fitness of any individual is 
in the additive form 



^ Z Wk ( u ’ v ) (7-29) 

k 

where the sum is taken over his genotypes at all loci. An example of 
such a scheme (with different notation) is given in (6.21): The notational 
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equivalence is 

oa = uq(l, 1), a 2 =uq(l,2), a 3 = Wi(2,2), 

7.30) 

Pi=w 2 ( 1,1), /?2 = ^ 2 ( 1 , 2 ), ft = w 2 (2,2). 

A generalization of the discussion following (6.21) shows that mean fitness 
depends on gene frequencies only and is thus nondecreasing from generation 
to generation: This, and the class of models discussed by Lyubich (1992), 
together form perhaps the only broad class of models of practical interest for 
which the mean fitness increase theorem holds for an arbitrary value of K. 
Further, Karlin and Liberman (1979) found that the product equilibrium is 
the only equilibrium of the AT-locus system for nonzero recombination rates, 
and it is globally stable at least for the case K = 2. At this equilibrium 
the additive genetic variance in any character can be found as the sum of 
single-locus values, but even though fitnesses are additive over loci, this is 
not generally true for the character “fitness” for nonequilibrium values. 

For the second model it is supposed that fitness is in the multiplicative 
form 

(7.31) 

k 

This scheme generalizes (6.28), to which it reduces with the identifications 
(7.30). Considerable speculation on the behavior of real genetic systems 
has followed from investigation of this model, and we therefore consider its 
properties in some detail. 

The product equilibrium exists for this model but is not a maximum 
point of mean fitness. Thus mean fitness can decrease in the neighborhood 
of the equilibrium, and the mean fitness increase theorem fails. We show 
later that the mean fitness at equilibria other than the product equilibrium 
is rather higher than at the product equilibrium point. 

A general analysis is difficult and unrevealing, so we consider a simplified 
case. Suppose that the recombination fraction between adjoining loci is i?, 
that there is no interference, and that for all fc, u and v 

w k (u,u) = l-a, w k {u, v) = 1, (u^v) (7.32) 

where a > 0. For K = 2 this is a particular case of (6.28), and then (6.33) 
shows that the product equilibrium is stable only if 

R>a 2 / 4. (7.33) 

When (7.33) is violated there exist complementary pairs of stable equilibria, 
with 

freq(Aii^ 2 i) = freq(A 12 ,4 22 ) = \ ± H 1 - 4 R/a 2 } 1/2 , 
freq(^u^ 22 ) = beq(A l2 A 2 i) = \ ? \{1 - 4 R/a 2 } 1 ' 2 , (7.34) 

D =\{\-AR/a 2 } l/2 . (7.35) 
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For K — 3, Feldman et al. (1974) showed that the product equilibrium 
is stable whenever (7.33) holds; when (7.33) does not hold, there exist four 
stable equilibria analogous to (7.34). Curiously, for a small range of R values 
(0.01 < R < 0.0104272 when a — 0.2), in excess of the bound a 2 /4, these 
equilibria continue to be stable, but for sufficiently large R (R > 0.0104272 
when a = 0.2) the product equilibrium is the only stable equilibrium. 

For K — 5, Lewontin ( 1964a, b) showed by simulation for the case a — 0.5 
that whereas the product equilibrium is stable for R > 0.0625, the bound 
given by (7.33), there exist several stable equilibria exhibiting linkage dis- 
equilibrium when 0 < R < 0.065. There will thus be a small interval of R 
values for which “D = 0” and “D ^ 0” stable equilibria exist. For K — 36, 
Franklin and Lewontin (1970) show by simulation that, when a = 0.1, a 
large number of stable equilibria exists when 0 < R < 0.0025; note that 
0.0025 is the bound given by (7.33). All of these are in linkage disequilib- 
rium. For 0.0025 < R 0.01 approximately, these stable equilibria persist 
with a stable product equilibrium also, while for R > 0.01 approximately, 
only the product equilibrium is stable. These conclusions jointly suggest 
that the range of R values for which there exist linkage disequilibrium 
stable equilibria increases steadily over the bound a 2 /4 as the number of 
loci in the system increases. However, a very powerful theorem of Roux 
(1974) shows that, for any multiplicative fitness scheme, the conditions on 
recombination fraction values that ensure stability of the pairwise prod- 
uct equilibrium for all adjacent loci is sufficient to ensure stability of the 
product equilibrium in the complete iT-locus multiplicative system. In par- 
ticular, in the present example, (7.33) is sufficient for the stability of the 
iT-locus product equilibrium for all K. This is an important conclusion and, 
in conjunction with the simulation conclusions of Franklin and Lewontin, 
suggests that for very large K a wide range of R values will exist for which 
stable linkage equilibrium and linkage disequilibrium equilibria occur. It is 
important to note that Roux’s theorem requires the condition (7.33) and 
its generalizations to hold for all pairs of adjacent loci. This was confirmed 
by Feldman et al. (1974) for the case K = 3, a = 0.2. For this value of 
a the inequality (7.33) becomes R > 0.01, and the product equilibrium is 
not stable if the recombination fraction between loci 1 and loci 2 is 0.0099 
and that between loci 2 and 3 is 0.0103. 

Lewontin (1964a) and subsequently Franklin and Lewontin (1970) no- 
ticed two further important properties of stable points of 5- and 36-locus 
systems respectively. The first is that loci far apart on the chromosome can 
be held in linkage disequilibrium when the recombination fraction between 
them is considerably in excess of the limit set by (7.33). This occurs be- 
cause of loci in the system segregating between these end loci. Thus, for 
example, when K = 5 the recombination fraction between loci 1 and 5 is 
R — 0.065 (47? = 0.26). Second, the value of D for adjacent interior loci 
is greater than the value predicted by (7.35). This effect is most marked 
for large values of R. Thus in the 5 locus case the equilibrium value of D 
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between loci 2 and 3 is about 1.06 times as large as the values predicted 
by (7.35) for R = 0.01 and about 2.91 times as large for R = 0.002 the 
value of D from (7.35) is 0.112 whereas Franklin and Lewontin found an 
average value of \D\ for adjacent loci of 0.22. Since necessarily D < 0.25, 
this is an extremely large value. For R > 0.0025, two-locus theory does not 
predict stable linkage disequilibrium equilibria, and yet for R = 0.004 the 
average value of \D\ for adjacent loci was found to be as high as 0.185. The 
average of D 2 for all pairs of loci correspondingly decreases from about 
0.05 at R = 0.0027 to 0.025 at R — 0.007 and essentially zero for R > 0.01. 

These latter observations (for K = 36) arise because, at the equilibria 
of the system investigated, the equilibrium gametic frequencies for small R 
arise in a highly structured form, with two gametes each having frequency 
of about 0.4 and all 10 9 remaining gametes having total frequency of about 
0.2. The two high frequency gametes are complementary in that for the 
great majority of loci, they carry the alternative forms of the alleles at 
each locus. (If stochastic loss of alleles at some loci had not occurred, these 
gametes would be perfectly complementary.) This suggests complex and 
interesting behavior for equilibria of multiplicative systems for large jFsT, an 
argument which is taken up in the final section of this chapter. 

A further property of highly structured systems is that the mean fitness 
is considerably higher than at a product equilibrium. In the present model 
the mean fitness at a product equilibrium is (l — |a) , which is 0.158 for 
a = 0.1, K = 36. For equilibria with two almost complementary gametes 
present in high frequencies, Franklin and Lewontin (1970) found mean fit- 
nesses in excess of 0.4. If only two complementary gametes occur, each with 
frequency 0.5, each individual is equally likely to be a complete homozy- 
gote (fitness = (0.9) 36 = 0.0225) or a complete heterozygote (fitness 1), 
leading to a mean fitness of 0.511. Of course with recombination between 
loci, this cannot an equilibrium value of mean fitness, but the equilibrium 
value will not be much less than 0.511, and this agrees with the observation 
of Franklin and Lewontin. 

The third model we consider is the “generalized nonepistatic” model of 
Karlin and Liberman (1978). Here the fitness of any individual is a linear 
combination of various multiplicative, additive and neutral components, so 
that this model can be thought of as being intermediate between the two 
just considered. We consider in detail only the two-locus case in which the 
fitness of the genotype A\ u Ai v A 2 3 A 2 t is 

biwi(u,v)w 2 {s,t) + b 2 W\{u,v) + bswi(s, t) -f 64. (7.36) 

The sufficient condition that the product equilibrium be stable is that the 
recombination fraction between the loci exceed 

biWAWBKVb 

a , b biWAWB(l — A2) (1 — jJ'b) + ^2 ^a (1 — A a ) + b^WB^- ~ Mfe) + ^4 

(7.37) 
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Here the notation is that of (6.36) and (6.37), with 

a uv = wi(u,v), b uv = w 2 (u,v), A u = Ai u , B s = A 2s , 



and with Ai, A2 , . . . , /ii, (i 2 , . . . defined as the nonunit eigenvalues of the 
matrices { c uv }, {d uv } respectively. When b 2 = bs — 64 = 0 (so that the 
model is multiplicative) this requirement reduces to (6.38), while if b\ = 0 
(so that the model is additive) it reduces to the known requirement R > 0. 
The condition on the value of R for stability of the product equilibrium is 
clearly less stringent than the corresponding condition for the multiplicative 
case with b 2 = bs = 64 = 0. 

The requirement (7.37) can be extended to an arbitrary number of loci, 
and an explicit condition (Karlin and Liberman (1979)) can be found for 
stability of the product equilibrium for general nonepistatic schemes which 
generalizes the condition (20) of Roux (1974) for purely multiplicative 
schemes. Although these conditions are not simple, two important conclu- 
sions emerge. First, the higher the mix of additive components in the fitness, 
as compared to multiplicative components, the less restrictive are the re- 
quirements on recombination for stability of the product equilibrium. This 
generalizes the discussion below (7.37). Second, stability of the product 
equilibrium obtains if there is “enough” recombination, whatever the mix 
may be, and in particular if all loci are unlinked the product equilibrium 
is stable for any generalized nonepistatic scheme. 

We turn finally to a fitness scheme not defined in terms of the Wk(u,v). 
If, in the two-locus model (6.29), the parameters (3 and 7 are equal, the 
fitness of any individual depends solely on the number of loci at which 
he is heterozygous. We consider now the generalization of such a scheme 
to an arbitrary number K of loci, assuming two alleles possible at each 
locus and that the fitness of an individual heterozygous at k loci is 7*, 
(k = 0,1,2 ,..., if). All results given below for this model were obtained 
by Karlin (1977a). 

By symmetry, the frequency of each gamete at the product equilibrium is 
2~ k . If this equilibrium is stable for zero recombination, it is the unique and 
globally stable equilibrium for all recombination patterns. The conditions 
for stability with no recombination is 




(7.38) 



and for free recombination between all loci is 
K 



k= 0 k 



if - 1 
k 



<0. 



(7.39) 



In line with the discussion above, if (7.38) hold then automatically (7.39) 
holds. By contrast, the requirement (7.39) can hold and the requirement 
(7.38) not hold, as the multiplicative fitness scheme shows. 
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Suppose next that y 0 < 7i < • • • <7 k < • • • < 7# so that fitness 
increases with increasing heterozygosity. Then (7.39) holds, so that the 
product equilibrium is stable for free recombination between loci and, more 
generally, usually for rather loosely linked loci. However, the condition 
(7.38) need not necessarily hold, as the multiplicative case again shows. 
But when the 7^ increase in a concave fashion, so that 7^ > ^(7^-1 +7fc+i), 
the condition (7.38) will hold: This was also noted by Slatkin (1972). In 
this case the additional fitness component for each additional heterozy- 
gous locus decreases with the number of current heterozygous loci, and 
the product equilibrium is the unique and globally stable equilibrium 
for all recombination patterns. When the 7 i form a convex system, so 
that 7*; < |(7*;_i + 7fc+i), the product equilibrium is stable only for 
sufficiently large recombination. Thus Lewontin (1964a) showed that if 
7o : 7i * 72 : 73 : 74 • 7s are in the ratios 2 : 3 : 6 : 11 : 18 : 27 and the 
recombination fraction between adjacent loci is i?, the product equilibrium 
is stable only if R > 0.038. 

Suppose now that the product equilibrium is not stable, so that the 7 i 
do not form a concave sequence and R is sufficiently small. For R = 0 and 
multiplicative fitness there are many stable equilibria, each one consisting of 
a pair of complementary gametic types. For sub-multiplicative fitnesses (for 
example, 7^ = (fc + l) 2 /(Ar+l) a , 1 < a < K — 1), there are stable equilibria 
with a number of gametic types present, each in moderate frequency. By 
continuity, for R « 0 the stable equilibria in the multiplicative model are 
such that two complementary gametes occur in high frequency, with all 
other gametes at extremely low frequency, as the simulations of Franklin 
and Lewontin, (1970) suggest. For sub-multiplicative fitnesses a number of 
moderate frequency gametes exist at stable equilibrium points. 



7.4 Non-Random Mating 

7.4-1 Introduction 

The need to analyze genomic data, in particular that from the human 
population, leads to the need for theory which relates to the evolutionary 
properties of the entire genome in populations that do not mate at ran- 
dom. In this section aspects of the multilocus non-random-mating theory 
will be developed, building on the theory given earlier in this chapter for 
the random-mating case and also on the theory given in Section 2.8 for the 
one-locus non-random-mating case. As above, we consider some quantita- 
tive character that is entirely determined for any individual by the genetic 
make-up of that individual. It is convenient to carry out the discussion by 
assuming that the character in question is fitness. This will have the benefit 
of providing theory for evolutionary processes and for the full multilocus 
generalization of the Fundamental Theorem of Natural Selection, described 
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below in Section 7.4.5. Nevertheless, much of the discussion applies to any 
arbitrary character. It will be found that surprisingly many random-mating 
results continue to hold in the non-random-mating case, although several 
do not. It is also surprising that several one-locus results carry over almost 
immediately to the multilocus case, although again several do not. 

7.4-2 Notation and Theory 

We recall the notation of Section 7.2, in particular the notation g s for the 
frequency of sth multilocus genotype, which we call genotype G s . Since 
random mating is no longer assumed, we may not assume that g s can be 
found from the frequencies of the the two gametes, one maternally derived 
and one paternally, constituting the genotype G s . 

As in the random-mating case, the multilocus genotype frequencies de- 
fine, by summation, the allelic frequencies at any chosen locus. For example, 
suppose we single out locus k and some allele A^ u at this locus. Then 
the frequency Xk u of this allele can be found from the various genotype 
frequencies {g s } through the formula 

s 

^ c(ku, s)g s , (7.40) 

where c(ku, s) is the number of times (0, 1 or 2) that the allele Ak u occurs 
in genotype G s . 

The fitness of the genotype G s is denoted tc s , so that the population 
mean fitness w is J2g s w s . The frequency of the genotype G s will, in 
general, change between the time of conception and the time of repro- 
duction of the parental generation, because of selective differentials: The 
intra-generational change in this frequency is A g s — g s w s /w — g s . The 
intra-generational change Axk u in the the frequency of Ak u is then found, 
from (7.40), to be given by 

2Ax fcu = c(ku, s)Ag s . (7.41) 

This change is also the inter-generational change in the frequency 

of Aku , where frequencies in both generations are taken at the time of 
conception. 

The multilocus genotype fitnesses and frequencies also define “entire 
genome” average effects {a^ n } of the various alleles at the various loci. 
These are found by minimizing an expression generalizing that in (2.57), 
namely 

y ~^9s{w s -w -^aku) . (7.42) 

The inner sum in this expression is the sum of all the average effects of the 
various alleles, at all loci, in the genotype G s , with the average effect of 
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any allele counted in twice if this allele occurs twice in this genotype. In 
this minimization procedure the constraints in (7.21) must be applied. 

This minimization leads to a set of simultaneous equations defining the 
{a fe „} values, typified by 

%ku&ku d - ^ d” ^ ^ ' Qku,rt&rt = (7*43) 

v r^k t 

In these equations, Xk u ,kv is the frequency of the ordered k - locus genotype 
containing alleles A^ u and A^v There are various (equivalent) definitions 
of Qku,rt • Perhaps the simplest (Lessard, (1997)) is that Qku,rt is twice 
the probability that, in an individual chosen at random, a gene chosen at 
random from locus k is A^u and a gene chosen independently at locus r is 
A rt . This collection of equations generalizes those given in (2.65), and can 
also be written in matrix and vector form as 

{D + P + Q)oc = wA. (7.44) 

Clearly this equation system provides a natural generalization of the equa- 
tion system (2.65), but it should be noted that the meanings of the symbols 
in the two equations differ. In (7.44), for example, the diagonal matrix D 
has all frequencies of all alleles at all loci in the genome, whereas in (2.65) 
the diagonal matrix D contains only the frequencies of the alleles at the 
locus being considered. Similar comments apply to P, a and A. 

It is important to note that the equation system (7.44) has a unique 
solution (Lessard, (1997)), as do equations (2.65). 

The multilocus additive genetic variance, namely the sum of squares 
removed by fitting the values in (7.42), can be written in terms of the 
average effects in the form 

o\ — 2 w EE c*ku( Ax/ cu ), (7.45) 

k U 

the summation on the right-hand side being over all alleles at all loci in 
the genome. This is the natural multilocus generalization of the one-locus 
formula (2.63). 

7,4-3 Marginal Fitnesses and Average Effects 

The various multilocus genotype frequencies and fitnesses can be used to 
find single locus marginal fitnesses, single locus average effects, and single 
locus additive genetic variances, all of which are defined below. Having 
found these, we shall compare them to the corresponding true multilocus 
values. The main reason for doing this is that some theoretical calculation, 
or the data from some experiment might, focus on a particular locus, and it 
is necessary to know what the relation is between one-locus marginal values 
for some quantity estimated from this experiment and the true multilocus 
values. 
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The frequency of the single-locus genotype A ku , kv at locus k, namely 
Xku,kv , is found by summing the frequencies of all multilocus genotypes 
containing A kU}kv . The marginal fitness of the genotype A ku A kv , denoted 
here w kU}kv , is defined as a weighted average by 



tij{ku,kv} 



yy^{ku,kv} 



g s w s 



Xku,kv 



(7.46) 



the sum again being taken over all multilocus genotype containing the 
genotype A ku A kv . 

The mean fitness w(k) as calculated from the marginal fitnesses at 
locus k is found by replacing Wij by w ku , kv in equation (2.56). This 
leads to the value w(k) = J2 U ^2 V X ku ^ kv w ku ^ kv . Equation (7.46) shows 
that this marginal value is identical to the true multilocus mean fitness 
w = Yj S Qs^s- This conclusion allows us to use w rather than w(k) in the 
calculations below. The change Ax ku in the frequency of the allele A ku as 
calculated from the marginal fitnesses at locus k is found from (2.60) and 
(2.62) as 



Ax ku = V X ku , kv - x ku . (7.47) 

“ w 

kv 

Equations (7.46) and (7.47) jointly show that this is identical to the true 
change in the frequency of this allele, as given in (7.41). This allows us to 
use the true change Ax ku rather than the value Ax ku in the calculations 
below. It also implies that the average excess of any allele at any locus can 
also be calculated correctly from single- locus marginal fitnesses. 

The marginal fc-locus additive effects estimates {dfc n } are defined by 
minimization of the expression (2.57), with replaced by w ku , kv . These 
are thus found as the solutions of the simultaneous equations 

•Eku&ku d - ^ ^ X ku kv ot kv — wAp ku . (7.48) 

kv 

In these equations we have used, on the right-hand side, the fact that 
the A;-locus marginal mean fitness, and each change in gene frequencies as 
calculated from fc-locus marginal fitnesses, are respectively equal to the 
true multilocus values. 

Equations of this form may then be written down for all loci and the 
resulting equations formed into one large system of simultaneous equations. 
With an appropriate ordering of loci and alleles within loci, this system of 
equations may be written as 

(D + P)6l = wA, (7.49) 

where D and P now have the same interpretation as in equation (7.44). This 
system of equations differs from the true multilocus system of equations 
(7.44), so we conclude that one-locus marginal frequencies and fitnesses do 
not lead to correct calculations of the average effects of the various alleles at 
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the various loci in the entire genomic system. The marginal fc-locus additive 
genetic variance estimate is defined by (2.63), with a* replaced by &k u • We 
denote this quantity by a \(k). The sum of these marginal estimates, taken 
over all loci in the genome, is thus 

Y J 0 2 A {k) = 2wY J Yl (AXfc-^ ) • (7.50) 

k k u 



7. 4-4 Implications 

In this section we consider some implications of the above results. 

First, suppose that the true multilocus additive genetic variance cr^, 
defined in equation (7.45), is zero. This implies that the sum of squares 
removed by fitting the a^ u values in (7.42) is zero, which implies that 
the aku values themselves are zero. This in turn implies, from equations 
(7.43), that each is zero, so that for one generation at least, gene 

frequencies do not change. However this does not necessarily imply that the 
full multilocus system is in equilibrium: Genotype frequencies can change 
without any resulting gene frequency change. 

The converse is also true. If each Axk u is zero, then the uniqueness of 
the solution of equations (7.43) implies that each ak u is zero, and this in 
turn implies that the true multilocus additive genetic variance is zero. 

Next, we have already observed that the true multilocus average effects 
are not in general correctly calculated from marginal fc-locus fitness values. 
Equality between the true and the values calculated from marginal fitnesses 
will arise if 



Qku,rt — ^^ku X r f (7.51) 

for all pairs (fctx, rt). In practice, this can be taken as a necessary condition 
also, and is a condition for total linkage equilibrium of all alleles at all 
pairs of loci. This condition is unlikely to hold even in the random-mating 
case, and is even less likely under non-random mating. Thus, in practice, 
multilocus average effects will not be estimated correctly from one-locus 
marginal fitness values. 

This conclusion, together with the close connection between additive 
genetic variances and average effects, as shown for example in equation 
(7.45), implies that in general, the true total additive genetic variance is 
not correctly found by summing one- locus marginal estimating values. This 
is despite the fact that the contrary assumption is frequently made in the 
classical literature. 

Despite this conclusion, some multilocus conclusions concerning additive 
genetic variances can be found from single-locus marginal results. If the k 
locus marginal additive genetic variance is zero, then each &k u is zero. The 
converse is also true. Further, if each dk u is zero, then each A p^ u is zero. 
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The converse of this statement is also true. Thus each of these three results 
implies the other two. 

If every single-locus marginal additive genetic variance is zero then every 
A Xku is zero, for all k and u. This implies that every ctk u is zero, by 
uniqueness, and thus that the true total additive genetic variance is zero. 
This sequence of implications also works in reverse order. 

No mention has been made in this section of the the additive gametic 
variance. This is because, when mating is non-random, gametic frequencies 
are not of value in determining genotype frequencies, so that the additive 
gametic variance, while it can be defined, is not useful. 



7.4.5 The Fundamental Theorem of Natural Selection 

Both the Price (1972) and the Lessard (1997) one-locus interpretations of 
the FTNS given in Section 2.9 can be generalized to the multilocus case. 
In the notation of Sections 7.2 and 7.4.2, the generalization of the Price 
one-locus interpretation (2.72) of the FTNS is that 

s 

a p (^) = XI ^ gs = ( 7 - 52 ) 

where 

(W S ) Q =U) + EE c{ku^ S^OLkui (7.53) 

k u 

the sum being taken over all alleles in the genome, and with c(ku, s) being 
as defined immediately below (7.40). 

The Lessard (1997) multilocus generalization of the Lessard single-locus 
statement (2.74) of the FTNS is that 

s 

A p(w) = Ws = Va/™- (7-54) 

In this expression the partial change (Ag s ) a in the frequency of the 
genotype G s is defined by 

o= , 7 . 55 , 

w 

The interpretations of (w s ) a and (Ag s ) a in the Price and the Lessard 
interpretations parallel the one-locus interpretations given in Section 2.9. 

The proofs of these two claims are almost immediate. The Price concept 
of the partial change in mean fitness is the middle term in (7.52), and since 
X) A g s w = 0, (7.52) and (7.53) jointly show that this is 

s 

XX(X A ^ c ( fcu ’ s )) O^ku • 

k u 



(7.56) 
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Equation (7.41) shows that this expression reduces to 

&ku 1 

k u 

and then (7.45) shows that this is equal to cr\/w. These simple steps 
complete the proof of the multilocus version of the FTNS in the Price 
interpretation. The proof of the Lessard version of the theorem is similar. 

There are several remarks to make about the full multilocus FTNS. Per- 
haps the more important of these is that the two interpretations of the 
theorem given above FTNS are identical in that they both make the same 
algebraic statement, as was the case for the one-locus version of the the- 
orem. To this extent there is a parallel between the one-locus and the 
multilocus versions of the theorem. There are, however, two aspects of the 
multilocus case which differ from those of the corresponding one-locus case. 
First, Fisher’s “proof” of the theorem in the multilocus case rested on a 
simple summation of one-locus additive genetic variances over all loci in 
the genome. Doing this is tantamount to claiming that multilocus average 
effects are the same as one- locus marginal estimates and that the multilo- 
cus additive genetic variance is the sum of one-locus values. The discussion 
in Section 7.4.4 shows that in practice, both claims are true only in very 
unlikely event that all equations of the form (7.51) hold. It is then per- 
haps surprising that the theorem, when analyzed in algebraic detail, does 
indeed generalize to the multilocus case. The effects of the two incorrect 
assumptions cancel each other out. 

This comment is connected to the second aspect of the relation between 
one-locus and multilocus results. It was shown in Section 2.9 with reference 
to the single-locus case that if fitness depends on the genotype at some given 
locus only and the equations in (2.76) hold for the genotypes at that locus, 
then the total change in mean fitness is equal to the partial change in mean 
fitness, defined either through (2.73) or (2.74). This conclusion no longer 
holds in the multilocus case. 

To see this, suppose that equations of the form (2.76) holds for all triples 
of one-locus genotypes at all loci in the genome. Then equations of the 
form (2.77), (2.78) and (2.79) also hold for all possible genotype triples at 
all loci. Multiplying throughout by w in equation (2.79) and changing to 
multilocus notation, we get 

W^X kuPku T ^ ^ Xku,kvftkv^ ^(Axfc^). (7.57) 

v 

The right-hand sides in equations (7.43) and (7.57) are equal so that, 
equating the two left-hand sides, 

%ku&ku T £ Xku,kv&kv T ^ ^ ^ ^ Qku^mt^mt — ^{p^kuPku T ^ ^ -^-ku^kvPkv) • 

v m t v 

(7.58) 
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In contrast to the corresponding conclusion in the one-locus case, this equa- 
tion does not necessarily imply that (3k u — and thus does not 

necessarily imply that equations of the form (2.77) hold. This in turn im- 
plies that even if (2.76) holds for all triples of genotypes at all loci in the 
genome, the total change in mean fitness is not necessarily equal to the 
partial change. In practice, since the term Qku,mt&mt in (7.58) is 

very unlikely to be zero, total and partial changes will be very unlikely to 
be equal. 

In conclusion we emphasize the breadth of application of the multilocus 
FTNS. It is a whole-genome result and makes no assumption about the 
mating scheme, so that random mating is not assumed. Further, while the 
derivations given above all assume viability fitness, Lessard and Castilloux 
(1995) have shown that with the same assumptions as those made for via- 
bility selection, the theorem also holds in the fertility fitness case. On the 
other hand, all the various simplifying assumptions made in the single- locus 
version of the theorem continue to be made. For example, the complica- 
tions caused by the sex chromosomes is ignored, as are those caused by the 
very existence of two sexes. This restricts for the moment the real-world 
applicability of the theorem, although a generalization of it to cover the 
case of two sexes is surely possible. 



7.4-6 Optimality Principles 

It is natural to follow a long-established practice in physics, associated there 
for example with least action principles, and to ask: What is optimized un- 
der the gene frequency changes brought about by natural selection?. One 
approach to this question was opened up (in the one- locus, random-mating 
case) by Kimura (1958) and has been taken up by, among others, Crow 
and Kimura (1970) Svirezhev (1972), Shahshahani (1979), Akin (1982) and 
Ewens (1992). We show below that the entire-genome, arbitrary-mating 
version of the FTNS leads to an optimality principle of natural selec- 
tion generalizing Kimura’s principle, which is restricted to the one-locus, 
random-mating multiple-allele case discussed in Section 2.4. A second, and 
quite different, optimality principle was introduced by Svirezhev (1972), 
and we discuss this principle below. 

The discrete-time version of Kimura’s (1958) principle derives from var- 
ious results in Section 2.4, and depends in particular the gene frequency 
changes implicit in (2.7) and (2.8), the mean fitness defined in (2.10) and the 
additive genetic variance defined in (2.17). We also note that this expres- 
sion for the additive genetic variance, together with the natural selection 
change in gene frequency 



A Xi = Xi(w{ — w)/w 



(7.59) 
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implied by (2.8), shows that the partial increase in mean fitness, described 
in (2.72), can be written as 



2 ^ A Xi(wi - w). (7.60) 

i 

Suppose now that gene frequencies are changed by arbitrary amounts 
^i, ^2? ■ • ■ j dk, where necessarily di = 0. We could then say, from (7.60), 
that the partial increase in mean fitness is 

Z^diiwi - w). (7-61) 

i 

Kimura (1958) then showed that under the restriction 

df/xi = a\/2 w, (7.62) 

i 

the set {di} of gene frequency changes that maximize the partial increase 
in mean fitness (7.61) is the natural selection set defined in (7.59). 

While this is an interesting conclusion, it is incomplete, since no inde- 
pendent extrinsic reason is given for imposing the constraint (7.62). From 
a purely mathematical point of view there will always be some quadratic 
constraint such that a linear function such as (7.61) will be maximized at 
some specified set of {di} values. Unless such a reason can be found for im- 
posing the constraint, the maximizing conclusion loses its force. This was 
pointed out, for example, by Edwards (1974). 

Moreover, the constraint is an important one. Crow and Kimura (1970) 
claimed that the result holds without the constraint, but this is not cor- 
rect. This was pointed out by Svirezhev (1972). Svirezhev carried out his 
analysis in terms of the non-Euclidean distance measure (ds) 2 = ^ L d 2 /xi 
appearing on the left-hand side of (7.62). This was done for mathemati- 
cal convenience and no biological justification is needed for this procedure. 
We will see later that non-Euclidean distance measures are natural in the 
optimizing context. 

Edwards (1974) noted that the Kimura optimizing principle can be cast 
in an equivalent dual form. This begins which the observation that when 
the arbitrary set of changes {di} in gene frequency is equal to the natural 
selection set of changes {Ax*}, the expression in (7.61) is equal to the 
partial increase o\jw in mean fitness. Then the dual form of the principle 
is that under the constraint 

2 ^ di(wi — w) = o\jw , (7.63) 

i 

together with the natural constraint JT di = 0, the set {di} of arbitrary 
changes in gene frequency is that which minimizes the quantity JT df/a;* 
appearing on the left-hand side in (7.62) is the natural selection set {Ax*}. 
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Since, from (2.16), the expression W{—w appearing in (7.63) is identical 
to c^, it is equivalent to write the constraint (7.63) as 

2d 'a = 2^2 d i a i = Oa/w- (7-64) 

i 

Our aim in this section is to extend the Kimura maximizing principle 
both to the whole genome level and, simultaneously, to the non-random- 
mating case, and also to find a natural biological reason why a constraint 
of the form (7.62), or alternatively of the form (7.64), arises. We consider 
only the one-locus case in detail and use matrix methods in the analysis, 
since with a change in the definition of the matrices involved, the multi- 
locus extension follows almost immediately from the single-locus analysis. 
Specifically, we use the matrix notation defined immediately above (2.65), 
re-writing this equation as 

A = (w)-\D + P)a. (7.65) 

The broad aim of the optimizing principle that we seek is to maximize, or 
to minimize, some function subject to some constraint. In order to suggest a 
starting point for doing this, we recall the derivation of the average effects of 
alleles through the minimization of the quadratic form (2.57) subject to the 
constraint (2.58) (in the single- locus case), or minimization of the quadratic 
form (7.42) subject to the constraint (7.21) (in the multilocus case). We 
therefore re-consider this procedure, and observe that the constraint (2.58), 
used in the derivation of (2.65) and thus of (7.65), is unnecessary, since 
this constraint is automatically satisfied at the unconditional minimum of 
the quadratic form (2.57). We are therefore free to impose an alternative 
constraint also satisfied at the unconditional minimum of this quadratic 
form. For this purpose we impose the constraint implied by (2.63), which 
we rewrite as 

a! A = = a\/2w . (7.66) 

u 

It can then be shown that the calculation of the average effects a is 
equivalent to the minimization of the quadratic form 

a'(L> + P)a, (7.67) 

subject to the constraint (7.66). 

We now turn to finding an optimality principle generalizing that of 
Kimura. Let d' = (di, cfe* • • • , dfc) be an arbitrary vector of gene frequency 
changes. Generalizing the aim of the dual form of the Kimura principle, we 
seek to find some quadratic form d'T _1 d in these gene frequency changes, 
generalizing the expression on the left-hand side of (7.62), which, subject 
to some natural constraint, is minimized at the natural selection values 
d = A. The constraint we use is that given by the Edwards constraint in 
(7.64). 
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This minimum is obtained, using Lagrange multipliers, by finding the 
absolute minimization of the function 

d'T _1 d - A(2d'a - a\/w) (7.68) 

with respect to the elements in d. Standard methods show that the 

minimum of this Lagrangian expression arises when 

d = A Ta, (7.69) 

where in this equation we may take A as a disposable constant. 

If we wish this conditional minimum to arise when d = A, it is necessary 
to choose T so that 

A = ATa. (7.70) 

A comparison of this equation with (7.65) shows that we must take the 
matrix T to be some multiple of the matrix D + P. Whatever this choice of 
multiple might be, minimization of d'T -1 d is equivalent to minimization 
of 



d'(D + P)- 1 d, (7.71) 

so we are now free to think of our aim as the minimization of this expression. 
Suppose we choose T = (w)~ 2 (D + P) and put z = w(D + P)~ l d. Then 

d'T -1 d = z f (D + P)z, (7.72) 

and the constraint d'ct = cr\/2w given in (7.64) becomes 

z'(D + P)(n;) _1 a; = a 2 A /2w. (7.73) 

Since, from (2.65), ( D + P)ot — wA, (7.73) may be written as 

z'A = a\/2w. (7.74) 

In other words, we can think of our aim as the minimization of z '(D + P)z, 
subject to the constraint in (7.74). 

But this is identical to the procedure defining the average effects of the 
alleles given above, that is the minimization of the expression in (7.67) 
subject to the constraints in (7.66). To the extent that we think of the 
calculation of the average effects as being a natural one, the minimization 
procedure that we have arrived at is also a natural one. 

Further, it is easy to see that under random mating, the expression 
d f (D + P) _1 d reduces to the Kimura expression so that our 

procedure is a direct generalization of his to the non-random-mating case. 
Thus we have not only generalized his procedure but have also justified the 
conditioning restriction in that procedure. 

In the random-mating case Svirezhev (1972) used the expression 
d 2 /xi as a natural non-Euclidian distance metric between parental and 
daughter generation. The above analysis shows that d'(D + P) -1 d is the 
natural generalization of this to the non-random-mating case. 
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The multilocus generalization of this follows almost immediately. The 
parallel between the single-locus matrix equation (2.65) and the multilo- 
cus matrix equation (7.44) and the natural flexibility of matrix operations 
shows that the changes in gene frequency brought about by natural 
selection are those which minimize the non-Euclidean genetic distance 
d f (D + P + Q) -1 d between parental and daughter generation multilocus 
gene frequency values, subject to the constraint that the partial increase in 
mean fitness between the two generations is o\jw . As in the single-locus 
case, there is a natural reason why the quantity d' (D + P + Q) _1 d can be 
regarded as a distance measure. 

Another form of optimality was established by Svirezhev (1972). This 
refers only to the continuous-time, single- locus, random-mating case, and 
does not appear to generalize to discrete time, many loci or the non- 
random-mating case. It is based on the fact that the equations of motion in 
physics can be obtained by an optimizing principle. Specifically, the path 
taken by an object is that which minimizes the total difference between 
kinetic and potential energy over the path. Adopting the notation of Sec- 
tion 2.7, we let Xi be the rate of change of the frequency of the allele A{. 
Svirezhev (1972) then forms the function / defined by 

f = + ^Xi(wi-w) 2, (7.75) 

X% 

i i 

where Wi and w are defined respectively in (2.9) and (2.10). This function 
is thought of as the difference between a kinetic and a potential energy, 
the latter being half the additive genetic variance and being zero at the 
equilibrium point (s) of the evolutionary system. The aim is to minimize 
the function 




subject to the natural conditions JT Xi = 1, for given starting and finishing 
times t\ and £2 and given initial and final gene frequencies. Svirezhev finds 
that the minimum is found when ±i = Xi(wi — w), which is exactly the 
equation of motion (2.41) that natural selection brings about. 

It has been claimed that natural selection acts in such a way that the 
total integral of the additive genetic variance along the evolutionary path 
of gene frequencies. The claim is made since along this path the equation 
ii = Xi(wi — w) holds, and that substituting this value of ±i into the 
equation (7.75) for /, it is found that / = 2^x*(i^ — w ) 2 , the additive 
genetic variance. However this argument is not correct, since the equation 

5Z — - = 52 x <( w i - ®) 2 



does not hold for arbitrary evolutionary trajectories. 
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7.5 The Correlation Between Relatives 

In this section we investigate properties of the correlation between rela- 
tives for characters that are assumed to depend on a large number of loci. 
A full discussion of this topic would take into account various forms of 
non-random mating and also environmental effects. A full consideration of 
these effects would take us far further into the biometrical aspects of popu- 
lation genetics than we wish to go. Further, correlations in the non-random 
mating case become extremely complicated, so we assume random mating 
throughout this section. Our aim is to restrict attention to the effects of 
linkage and linkage disequilibrium on standard formulas for these correla- 
tions under the simplifying assumptions we make. A brief discussion of the 
entire question of the correlation between relatives will be given in Chapter 
8. 

We first consider characters determined by a single locus and consider 
a more efficient method of arriving at (1.17)— (1.20). In this and in later 
generalizations to characters determined by many loci, it is convenient to 
calculate covariances rather than correlations: The latter can always be 
found from the former by dividing by the total variance a 2 in the character 
considered. 

Suppose that the alleles at the locus in question are Ai, A 2 , . . ., A&, with 
respective frequencies aq, X 2 , . . ., Xk, and that the value for the character 
for an A U A V individual is m uv . We define fh and the average excess a u of 
A u as in 2.60, and write 

Wlu,’ V 777 "F fly, H" CLy + d Uy vi (7.76) 

so that 

^ a u x u = 0, ^2d u ^ v x u = 0 for all v. (7.77) 

u u 

The additive and dominance genetic variances u\ and g 2 d are given by 
(1.16) and (1.15). Consider now two individuals X and Y, with mea- 
surements m(X) and m(Y). The covariance between these measurements 
is 



E{(m(X) - m)(m(Y) - m)}. (7.78) 

If the two individuals are unrelated, this covariance is zero. On the other 
hand, related individuals can possess genes in common that are identical 
in descent from one or more common ancestors. Thus, for example, the 
contribution a u in (7.76) may be identical in both individuals, since both 
possess an A u gene passed on from a common ancestor. Suppose the genes 
possessed by individual X are x/, x m , where the suffixes / and m denote 
the genes passed on from father and mother, respectively, and define yy, 
y m similarly for individual Y. We use the symbol “=” to denote “identical 
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by descent” , and define 

Pff = Prob(x/ = y f ), P fm = Prob{x f = y m ), 

P m f — Prob(x m = !/f ) ■ P-mm P l'()l ) (,!' rrl = jjnt ) ■ 

Then by inserting (7.76) in (7.78) it is found that 

covar(X, Y) — 2 {Pff "h P fm "b P m f "b P mm)® A 
+ ( PffPmm + PfmPmf )®D' 

This formula, due essentially to Malecot (1948), provides a simple and 
powerful method for deriving covariances for any two related individuals, 
and we now use it to re-derive (1.17) - (1.20). 

Consider first the father-son correlation, with X being the father and Y 
the son. Since the mother and father are assumed to be unrelated, P mm = 
Pfm = 0- Also Pff = P m f = and insertion of these values into (7.80) 
yields (1.17). If X and Y are full sibs, Pff = P mm = Pf m = P m f = 0, 
and insertion of these values in (7.80) gives (1.18). Equations (1.19) and 
(1.20) can be found equally easily. 

We next extend (7.80) to the case where the character in question de- 
pends on all K loci in the genome. Here additive x additive, additive x 
dominance, and further variance terms enter into the covariances, and it is 
necessary to develop notation for these. We write ^ o\ for the sum, over 
all of loci, of the (marginal) additive genetic variances at those loci, with 
a similar notation for dominance variances. We also write a\ A f° r the 
sum, over all possible pairs of loci, of the additive x additive variances for 
each pair, ^ a 2 AD for the sum of all additive x dominance variances, and 
more generally 



(7.79) 



(7.80) 



5>a-d». (7.81) 

for 1 < r + s < K, for the sum of all possible r-wise additive and s-wise 
dominance variances. 

Consider first the simplest case when all loci involved are unlinked and all 
pairwise coefficients of linkage disequilibrium are zero. Kempt home (1954, 
1955) obtained the appropriate generalization of (7.80) as 

covar(X, Y) = {\{Pff + Pfm + Pmf + Pmm)) 

X ( PffPmm + PmfPfm) 8 C 2 A r D s (7.82) 

where the summation Yl' is over a ll r an d s with 1 < r + s < K. Writing 

a = \{Pff + Pfm + Pmf + Pmm)’, ft = PffPmm + PfmPmf, (7.83) 

this becomes, for a character depending on two loci, 

covar(X, Y) = a^2a 2 A + a 2 a 2 AA + [3 ^ a 2 D + 0 2 a 2 DD + af3o 2 AD , (7.84) 
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where both summations are over the two loci involved. This confirms in 
an elegant way the correlations found in Section 2.10. It follows also from 
(7.82) that for a character depending on an arbitrary number of loci, 

covar( father-son) 

^ 1" w X/ Cr ^ K }’ (7.85) 

covar(grandfather-grandson) 

= {lE^ + ieYl a AA + --- + ( 7 - 86 ) 

Clearly ancestral line covariances do not contain dominance terms. It is 
not clear how important the various terms in (7.85) and (7.86) are. While 
for large r each r- order additive interaction term is probably very small, 
there are (^f) terms in a\ r and, even allowing for the factor 2 _r , the total 
contribution of r- order interaction terms need not be small. 

Our next objective is to generalize (7.82) to allow for linkage between 
loci, continuing however to assume complete linkage equilibrium. This gen- 
eralization has been considered for two loci by Cockerham (1956) and for 
K loci by Schnell (1963) and van Aarde (1975). The typical term in (7.82) 
is the sum of expressions of the form 

a’TJVfo, hi, 4) (7.87) 

where ki < k 2 < ■ ■ ■ < k r , 4 < 4 < - ■ • < 4 with k p ^ l q for all p, q and a 2 
(Aq , & 2 , . • • , k r ; l\ , £ 2 , • • • As) denotes the contribution to the variance of the 
interaction of the additive effects at loci fci, & 2 , ■ • ■ , k r and the dominance 
effects at loci ^ 1 , 5 The coefficient a r (3 s is the probability of r + s 

independent events, the independence arising from the assumptions that 
the loci are unlinked and that linkage equilibrium obtains, composed of 
events typified by 

Ejj = {genes derived by X and Y at locus k from their respective 
fathers are identical by descent}. 

For unlinked loci events of the form and are independent for k ^ £ 
if we continue to assume linkage equilibrium, but this is no longer true for 
linked loci. Thus, for example, the probability of the compound event 

E %^f ••• E k fj (7.88) 

can no longer be obtained by simple multiplication, but will involve the 
recombination fractions between loci Aq, & 2 , . . . , k r . The appropriate gen- 
eralization of (7.82) for linked loci is found by replacing the typical 
term by the sum, for all possible r-wise additive and s-wise dominance 
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contributions, of 



(l) r Prob(n)E^ + Eft, + B % + EfeJ 

P~~ 1 

X niE^E^ + E^E,^}) 
g=l 

x a 2 {k 1 ,k 2 ,...,k r -J u i 2 ,...,i s ). (7-89) 

In the calculation of this probability, all product terms are expanded out 
and the probabilities of sums of compound events of the form (7.88) cal- 
culated. This formula, although explicit, yields rather complicated values 
when more than two loci are involved. In particular, the expression (7.89) 
does not in general provide a contribution to the covariance of the simple 
form cXM rDs , for some constant c. We thus consider the application of 
(7.89) for two loci only, assuming the recombination fraction between these 
loci is R. The summations in the various formulas below are over these two 
loci. 

Consider first the case of full sibs. Here the events E/ m and E m / are im- 
possible (for both loci) and the events E //, E mm have the same probabilities 
for both loci. These observations lead to a covariance of 

iProb(E^ + E LJ £4 + Prob[E^EjU 

+ iProbKE^ + E k mm )(E l ff + E l mm )}a 2 AA 

+ iProb[(E^ + E k mm )E l ff E l mm }a 2 AD 

+ Prob[E k ff E k mm E l ff E l mm }a 2 DD . (7.90) 

Events involving the subscripts ff and mm are independent. We know 
that Prob(E^) = Prob(E^ m ) = \ so to compute (7.90) it is necessary to 
compute Prob(EyyEjy). By considering the various cross-over possibilities 
it is found that this probability is 

^ - ■ R T R^ i (7.91) 

and substitution of this value in (7.90), and the same value for Prob(E^ m 
Emm), ^ads immediately to the expression in (2.125). 

It is of some interest to compute half-sib and ancestral line covariances. 
For half-sibs, assuming a common father, (7.90) still holds, since E/ m and 
E m / are still both impossible events, but now E mm is also an impossible 
event and (7.90) reduces easily to 

covar (half-sibs) = - R + R 2 )a 2 AA . (7.92) 

For ancestral lines there are no dominance components to the covariance 
but, interestingly, except for the father-son value, these covariances do de- 
pend on R. Thus, for example, (7.85) still applies for all R but, for two 
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loci, (7.86) must be replaced by 

covar (grandfather-grandson) = I + (7-93) 

While the right-hand sides in (7.92) and (7.93) are identical for unlinked 
loci and also for completely linked loci ( R = 0), they are not equal for 
0 < R < For these value of i?, linkage causes a greater change in the 
grandfather-grandson covariance (compared to the unlinked loci case) that 
it does to the half-sib covariance. 

Do simple limiting covariances hold for tightly linked loci? van Aarde 
(1975) demonstrated that, as R — > 0, 

covar (A, Y) 

+ P(5 Ed + l a AA + a AD + (7-94) 

where a and (3 are defined in (7.83). This formula can also be deduced 
immediately from Schnell, 1963. Fisher (1918) asserted that for linked loci 
the pattern of covariances for traits depending on two loci would not differ 
significantly from those applying for traits depending on one locus. For 
tightly linked loci this view is supported by the form of the expression in 
(7.94), which is of the same as that in (7.80) with the single- locus additive 
and dominance variances being replaced by the “generalized” values 

a D + \v 2 AA + a AD + a DD (7.95) 

respectively. A parallel remark no doubt holds for a number of closely linked 
loci. For values of R not close to 0 or however, slight deviations from the 
pattern suggested by (7.80) do occur, as may be seen by comparing (7.92) 
and (7.93). While these covariances are identical at R = 0, R — they 
differ at R = \ by (t\ a /6 4, a value that is however probably negligible for 
most traits. 

Formulae for covariances when the loci involved are not in linkage equilib- 
rium are extremely complicated. Even for two loci the expressions given by 
Gallais (1974) and Weir and Cockerham (1977) contain upward of a hun- 
dred terms, and there can be no hope of estimating the various components 
from even the most extensive data. Thus, while a useful general formula 
for correlations appears to be almost impossible to find, some progress can, 
however, be made by considering special cases. The models considered in 
Section 7.3.4 suggest that for large recombination fractions equilibrium val- 
ues of the coefficient of linkage disequilibrium are likely to be small: This is 
confirmed by simulation for “random” fitness patterns. Thus most interest 
attaches to the case of small recombination fractions where, as we have 
noted, both linkage disequilibrium and linkage equilibrium equilibria are 
both sometimes stable. 

For two loci and small R we may compare correlations between relatives 
by comparing the limiting value (7.94) for the linkage equilibrium case with 
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the corresponding limiting value found by direct computation when only 
two gametes exist, so that there is maximum linkage disequilibrium. It is 
found that there is no necessary relation between the two correlations and 
that they can differ considerably even for simple models. Thus if the matrix 
of measurements is of the simple additive form 

^ 21^21 ^ 21^22 
A n A n 1.1 1.1 

A \ iA \2 1.2 1.2 

A 12 A 12 1.4 1.4 

the correlation between two relatives X and Y is 

corr(X, Y) = 0.874a + 0.126/? (7.97) 

in the linkage equilibrium case where 



A22A22 

0.8 

0.9 

1.1 



(7.96) 



2 / 11,21 = 0.09, 2 / 11,22 = 0-21, 2 / 12,21 = 0.21, 1 / 12,22 = 0.49. 

Here a and /? are defined in (7.83). By contrast, the correlation is 

corr(X, Y) = 0.276a + 0.724/? (7.98) 

in the case of extreme linkage disequilibrium, where 

2 / 11,21 = 0.3, 2 / 11,22 = 2 / 12,21 = 0, 2 / 12,22 = 0.7. 

Although this example represents two extreme cases of linkage disequilib- 
rium, it is perhaps disquieting that the correlations (7.97) and (7.98) should 
be so very different. 

This observation leads us to a more informal discussion of the correla- 
tion between relatives for a metrical trait determined by many loci and 
also by the environment. So far we have considered only the purely genetic 
contribution to these correlations, and have also assumed random mating 
with respect to the trait of interest. For practical applications it is impor- 
tant to discuss the consequences of assortative mating and environmental 
effects on these correlations, since assortative mating certainly occurs with 
respect to various metrical characters and a shared or a similar environ- 
ment between close relatives implies an environmental component to the 
correlation between relatives. Clearly, any modeling of the environmental 
contribution is hazardous. 

Almost all authors, when considering assortative mating and environ- 
mental effects, have assumed, usually implicitly, that the loci controlling 
the trait are unlinked, that complete linkage equilibrium holds, and that 
no epistatic effects arise. If these simplifying assumptions hold, then (7.82) 
becomes 



covar(X, Y) = + P^ajy, 



(7.99) 




272 7. Many Loci 



where a and (3 are determined in (7.83) and both sums are taken over all 
loci involved in determining the trait in question. We rewrite this as 

corr(X, Y) = aH + (3D , (7.100) 

where H = D = a notation more in line with that 

of biometrical genetics and one we follow for the remainder of this section. 
We emphasize the strong assumptions that have been made in replacing 
complex equations such as (7.89), which is itself far less complex than 
the corresponding equation when linkage equilibrium is not assumed, by 
the simple (7.100). In this connection we observe that in the biometrical 
literature, for example in Eaves et al. (1977), it is sometimes implied that 
absence of assort at ive mating ensures linkage equilibrium, but the earlier 
theory of this chapter shows that this claimed implication is incorrect. 

The question of assortative mating without environmental effects was 
considered by Fisher (1918) and many subsequent authors. We assume a 
correlation r pp between mating individuals for the trait in question: The 
case r pp — 0 leads to equations like (7.100). When there is no dominance, 
Nagylaki (1978) arrived by very simple arguments at formulas for the cor- 
relation between relatives. Wright (1921), Crow and Felsenstein (1968) and 
Wilson (1973) arrived at extensions of these formulas for the more diffi- 
cult case where dominance exists. Some examples of the correlations of an 
individual with the relative specified are the following: 

corr(with parent) = r p0 = |(1 + r pp )H , 
corr(with full sib) = r ss = |(1 -f r pp H)H 4- jD, 

corr(with nth generation grandparent) = r p o{|(l + r pp H)} n \ (7.101) 

corr(with uncle) = r un = {|(1 + r pp H)} 2 H 
g t pp DH , 

corr(with first cousin) = r/ c = {|(1 + rpp H)} 3 H 

+ D{\r pp H}\ 

More complex formulas for models where selection acts against individuals 
with extreme values of the character have been given by Wilson (1973, 
1976). 

We turn now to environmental effects. These were considered by Fisher 
(1918), but only for the simplified model where the phenotypic value P 
of the trait can be written as P = G + £, G and E being genetic and 
environmental contributions, and for which 

dp = var(P) = var(G) 4- var(E). (7.102) 

The equation P = G 4- E implies no interaction between genotype and 
environment, while (7.102) further implies no covariance. While these 
assumptions may be reasonable as approximations in various controlled 
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breeding experiments, they are severe assumptions for a trait such as IQ in 
humans. The entire work of Burt (1971) on this trait, for example, makes 
precisely these assumptions. Burt’s approach is to define E = v&r(E)/a 2 
and to add E to each correlation displayed in (7.101), where now H and 
D are redefined using a 2 instead of a 2 as divisor. Burt also uses a variance 
term for assortative mating but this appears to be unjustified and is not 
followed by other authors. If any two correlations can be estimated from 
data, these formulas then give JT, D and E , since H + D + E = 1. Burt uses 
the three correlations r p o, r pp and the correlation (1 + E)~ l of identical 
twins, since under his model a further correlation is required to estimate 
the “assort ive mating” component of variance. 

A more realistic model clearly allows for both genotype-environment 
interaction and covariance, and also distinguishes the almost certainly 
different environmental correlations for individuals of different degrees of 
relatedness. Various models, of increasing complexity, have been put for- 
ward, largely in the psychological literature to this end. Jinks and Eaves 
(1974) assumed an added environmental correlation only for those individ- 
uals in the same family, that is, for parents and offspring living together; 
this applies for the first two correlations in (7.101). A revised model adds 
a correlation only for sibs raised together. Eaves et al. (1977) presented a 
model with two environmental correlations, E\ for “within families” and 
E 2 for “between families” : Here E\ is added to the first two correlations in 
(7.101) and E 2 to the last three. 

With empirical values for correlations between a variety of relatives, 
involving perhaps 10-15 relationships, the parameters of any reasonably 
simple model can be estimated by least squares and the goodness of fit of 
the model tested by chi-square. The argument in this procedure is that if 
a fit is not rejected by chi-square the model can be accepted and a more 
complex model is not needed. Perhaps the major difficulty with this pro- 
cedure is the low power of the goodness-of-fit procedure. Thus Last (1976) 
found that for reasonable parameter values sample sizes in excess of 5,000 
are required to be fairly certain of detecting genotype-environment interac- 
tions of some magnitude. The question of the adequacy of various models 
and of the above fitting procedure has led to much acrimonious discus- 
sion in the literature: See Eaves et al. (1977), Mather and Jinks (1977) 
and Goldberger ( 1978a, b) for summarizing and strongly contrasting views. 
Our interest here is more in the genetic aspects, and we conclude by em- 
phasizing that essentially all models used in the biometrical literature, on 
all sides, have used simple formulas, such as (7.99), for the purely genetic 
component of correlation and that in practice far more complex formulas 
are surely required. While simplifying assumptions must be made in these 
analyses at some level, it is not yet certain that the level chosen, that is 
that leading to (7.99), is a satisfactory one. The entire question of using 
correlations to estimate heritability must be viewed with great caution. 
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7.6 Summary 

The two main conclusions reached in this chapter are, first, that opposing 
views may reasonably be held about the likely extent of linkage disequi- 
librium in natural populations and, second, that the degree of linkage 
disequilibrium that does occur will alter, perhaps significantly, the numeri- 
cal values of several important population genetic parameters. As a result, 
the extent of linkage disequilibrium that can be expected in natural pop- 
ulations will affect one’s view of the likely dynamic and static behavior of 
these populations. 

The view that extensive linkage disequilibrium might occur in natural 
populations has been promoted in particular by Franklin and Lewontin 
(1970) and Lewontin (1974). This view is supported in some special cases 
by other authors, for example in the case of disruptive selection by Bulmer 
(1976). While Franklin and Lewontin show that the correlation properties 
in multilocus systems can build up linkage disequilibrium values in excess 
of those predicted by two-locus theory, their analysis applies almost ex- 
clusively for multiplicative selective models. Our analysis shows that for 
many other models, rather less linkage disequilibrium can be expected. 
Even under the multiplicative scheme, stable equilibria with no linkage 
disequilibrium can exist simultaneously with stable equilibria with high 
linkage disequilibrium, and it is not certain in general what the domains 
of attraction of the two types are. For stabilizing selection schemes the 
numerical calculations of Bulmer (1976) confirm the prediction of Wright 
(1965b) that the true additive genetic variance is somewhat less than the 
value computed assuming no disequilibrium, but the decrease is only of 
order 25% in Bulmer ’s simulations. 

The theory described above does not take into account the possibility 
that linkage disequilibrium can be due to factors other than selection, in 
particular by geographical structure, as discussed below in Section 8.4. 
The existence of so-called “haplotype blocks”, especially in humans, indi- 
cates that substantial linkage disequilibrium exists in humans, and this is 
of particular interest to human geneticists wishing to map disease genes. 
These blocks might be due to selection, to geographical factors, or simply to 
the existence of recombinational hot-spots and nonrecombinational “cold 
regions”. The haplotype block concept, and mathematical aspects of the 
problem of mapping disease genes, will be considered in detail in Volume 
II. 

We have already recorded in the summary of Chapter 6 the view of Ohta 
and Kimura (1975) that on the whole, linkage disequilibrium is generally 
small in natural populations, and to a first order can be ignored: Broadly 
speaking, this view is also given in Crow and Kimura (1970). Even to this 
day, no uniform view yet exists on this point. 

If extensive linkage disequilibrium does occur, the genetic properties of 
a population can be quite complex. The mean fitness increase theorem 
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need not hold, although one suspects that a generalization of the concept 
of quasi-linkage equilibrium ensures that mean fitness “mostly” increases. 
The mean fitness of a population with strong linkage disequilibrium can be 
quite high, with only a comparatively small number of possible genotypes 
represented with non- negligible frequencies. However, such an structured 
system has less flexibility and thus capacity to cope with altered environ- 
mental conditions than does an unstructured population. The true additive 
genetic variance may be rather less or considerably more than that calcu- 
lated without linkage disequilibrium, and in general the properties of the 
genetic system cannot easily be found by combining single-locus analyses. 
The correlation between relatives for measurable characteristics is also af- 
fected by the degree of linkage disequilibrium between the loci determining 
these characteristics. 

Many of the general multilocus results for the case of a random-mating 
population continue to hold in the non-random-mating case, but some do 
not. The Fundamental Theorem of Natural Selection holds whether mating 
is random or non-random. 

We conclude with two remarks concerning Chapters 9 and 11. In Chapter 
9 models are considered that recognize the gene as a sequence of nucleotides. 
The theory of the present chapter is relevant to this analysis, with the nu- 
cleotide replacing the gene as the fundamental unit. Of course entirely new 
values for mutation rates, recombination fractions and selective differentials 
are appropriate at the molecular level, and this will alter the way in which 
we view certain formulas involving these parameters. Second, in Chapter 
11 various tests of the “neutral theory” are described. These tests consider 
the hypothesis that the alleles at some locus are selectively equivalent, and 
use only the frequencies of these alleles in the testing procedure. The the- 
ory thus ignores the possible effects on these frequencies of closely linked 
selective loci. If extensive linkage disequilibrium does occur gene frequen- 
cies reflect more the selective forces acting on segments of the chromosome 
rather than single loci, and thus much of the testing theory described in 
Chapter 11 becomes invalid under these circumstances. 




8 

Further Considerations 



8.1 Introduction 

In line with the aims of this book, the material given in preceding chapters 
has focused on purely mathematical considerations. However many top- 
ics in population genetics generally require an extensive verbal discussion 
with perhaps a minimum of mathematical treatment. Some of these topics 
are discussed in this chapter. Some mathematical material not naturally 
addressed in previous chapters is also taken up here. The content of this 
chapter may be viewed as an introduction to a more complete discussion, 
in Volume II, of the topics considered here. 



8.2 What is Fitness? 

In most of this book “fitness” has been taken to mean viability fitness, 
that is as a measure of the capacity of an individual of a given genotype to 
survive from the time of conception to the age of reproduction. In Section 
2.6 fertility selection was discussed briefly, but nowhere have we considered 
the fitness component relating to mating success. Fitness properly embraces 
all three aspects and, to be considered in greater detail, should involve the 
population age distribution as well as ecological and other factors, very 
few of which are mentioned in the literature in any detail. Indeed the con- 
cept of fitness is possibly too complex to allow of a useful mathematical 
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development. Since it enters fundamentally into many population genetics 
considerations, it is remarkable how little attention has been paid to it. 

The most complete discussion of the concept of fitness is that of 
Kempthorne and Poliak (1970), who emphasize several points worth re- 
peating. First, while it is uniformly agreed that fitness is a property of the 
entire genome of an individual, it is also apparently agreed, with Wright 
(1931), that to a first approximation, for a short time, a constant net selec- 
tive value of any allele may usefully be defined. The concept of the marginal 
fitness value for any one- locus genotype, introduced in (7.46), and the fact 
that these values do not usually lead to the correct values of the additive af- 
fects of the alleles at the locus at which these marginal fitnesses are defined, 
are relevant factors in this viewpoint. However, evolutionary arguments re- 
quire consideration of very long time periods, and it is not certain that the 
approximation of a constant net selective value of any allele is adequate for 
long-term considerations. The analysis of the interactive effects of genes 
at several loci given in Chapters 6 and 7 is relevant to this point. Second, 
the lack of connection between the Fisherian definition of fitness through 
Malthusian parameters and the bulk of mathematical evolutionary theory 
is unfortunate. Finally the very concept of fitness, in particular when fe- 
cundity parameters are not in the multiplicative form (2.28), appears to be 
elusive. 



8.3 Sex Ratio 

The problem of the evolution of the sex ratio was discussed briefly in Section 
1.5, and the “non-genetic” argument of Fisher concerning this evolution was 
noted. A number of genetically-based arguments for sex ratio adjustment 
have been put forward since Fisher’s time: See in particular Shaw and 
Mohler (1953), Shaw (1958) and Eshel (1975). For theoretical analysis of 
Fisher’s theory see Kolman (1960), Bodmer and Edwards (1960), Edwards 
(1961) and Verner (1965). 

We now briefly describe an approach to this problem by Uyenoyama 
and Bengtsson (1979) which is of particular interest, since the analysis 
is genetically based and is close in spirit to Fisher’s original argument. 
Consider, in a diploid population, an autosomal locus admitting alleles 
A\ and A 2 and hence genotypes A\A\, A\A 2 and A 2 A 2 , which we call 
genotypes 1, 2, 3 respectively. The frequency of males (females) of genotype 
i is rrii(fi) and = J2 fi = 1- We make two assumptions concerning 

the female genotypes. The first is that different genotypes have different 
brood sizes: Suppose females of genotype % have brood size proportional to 
Second, we suppose that the sex ratio among offspring depends on the 
maternal genotype, and that females of genotype % produce a fraction a* 
of male offspring and 1 — a* of female offspring. 
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The values cq and Oi specify the recurrence relations of the fi and ra*. 
We are particularly interested in the equilibria of these recurrence relations 
and will consider these only in the particular case where the condition 

(J\(T2{oli — 0:2) + < 7 2 (^2 — ^3) + <73(71 (0:3 — Qq) = 0 (8.1) 

holds. (We show later that an argument using the parental expenditure 
concept leads to (8.1).) The recurrence relations have three equilibria, one 
of which is symmetric (?rq = fi) and the others asymmetric (ra* ^ fi). We 
focus attention here on the asymmetric equilibria. 

The frequencies M of males and F of females in the population are 

M = /mail ^ fieri, F = ^/ i CT i (l-a i )/]P/iCTi. (8.2) 

Shaw and Mohler (1953) and others define the mean fitness W for this 
model as 

W = Y, + (1 - a t )F}. (8.3) 

Subject to Ylfi ~ 1> W may be maximized with respect to the /), and, 
assuming that (8.1) holds, the maximizing values are found to occur pre- 
cisely at the asymmetric equilibria of the system. In other words, if the 
system evolves towards such an equilibrium, it is evolving in such a way as 
to optimize the sex ratio as measured by the mean fitness W. It is found 
that the optimizing value of the sex ratio is 

F/M = {(7i(l - oq) - <t 2 (1 - a 2 )}/{<7 2 a2 ” (8.4) 

It remains for us to justify that (8.1) will be true under the parental ex- 
penditure concept. Suppose the ratio of the parental expenditure required 
to raise a female offspring to maturity compared to that to raise a male off- 
spring is <f : 1 . The mean expenditure per offspring of a female of genotype 
% is then proportional to oq + (j>{ 1 - a*). If females of all genotypes make 
the same total expenditure for their entire brood, then the brood size cq 
must satisfy the requirement 

Vi = K{ai + 0(1 - a*)} -1 (8.5) 

for some constant K. Values of cq satisfying (8.5) automatically satisfy 
(8.1), and furthermore lead to an F/M value of as might be ex- 
pected. Thus the optimal sex ratio is determined entirely by the relative 
expenditure in rearing male and female offspring. 



8.4 Geographical Structure 

The importance of geographical structure to the evolutionary theories of 
Wright (1931, 1969a, b, 1977, 1978) was noted in Chapter 1. The effects of 
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structure have been considered at some length in the literature and here 
we refer, very briefly, to several features of the analyses made. 

The two main types of model of geographical structure considered in 
the literature are an “island” model of distinct subpopulations, or demes, 
and a continuous dine model dines in one or sometimes two dimensions. 
A standard fact about an island model is that if random mating obtains 
within any island, the fraction of individuals who are heterozygotes at a 
single locus admitting two alleles is necessarily less than or equal to the 
corresponding fraction for a large random-mating population with the same 
mean allelic frequencies. If the frequency of A\ in island i is x*, and island i 
comprises a fraction fi of the entire population, this (Wahlund) inequality 
can best be demonstrated through the equation 

2x(l -x)=2 y jT fiXi( 1 - Xi) + 2 y] fi(xi - x) 2 , (8.6) 

where x = We use this fact below when considering mean fixation 

times in stochastic models involving geographical structure. 

The Wahlund effect concerns the frequencies of the genotypes at a sin- 
gle gene locus. The effects of geographical subdivision are perhaps more 
important in the analysis of multilocus systems, and we now consider one 
such analysis, of particular interest to human geneticists, where the joint 
frequencies of the genotypes at two loci are of interest. To be concrete we 
assume alleles A\ and A 2 , with frequencies x and 1 — x, at the first locus 
and Bi and B 2 , with frequencies y and 1 — y, at the second locus. We also 
assume that the two loci are unlinked. 

Suppose that a proportion fi of a population lives on one island and a 
proportion f 2 = 1 — f\ lives on another. The frequencies of A\ and A 2 in 
the first island are assumed to be x\ and l—x\ and in the second island X2 
and 1 — x 2 . Similarly the The frequencies of B\ and B 2 in the first island 
are assumed to be y\ and 1 — y\ and in the second island y 2 and 1 — y 2 . 

We assume random mating within each island, that a sufficiently long 
time has passed so that a stationary situation has been reached, and also 
that there is no selection at the two loci of interest. Then linkage equilibrium 
exists within each island, so that for example within island 1 the frequency 
of the gamete A\B\ (more frequently called “haplotype” in human genetics) 
is x\y\. Thus the overall frequency of the gamete A\B\ is c\ — f\X\y\ + 
f 2 % 2 y, with similar calculations leading to overall frequencies c 2 ,cs and C 4 
of AiB 2 , A 2 B\ and A 2 B 2 . 

Even though linkage equilibrium holds within any island, it does not 
necessarily hold over the entire population, so that it is not necessarily the 
case that C1C4 — C2C3 = 0. For example, suppose that f x — f 2 — 1/2, so 
that the two islands have equal population sizes, that the frequencies of the 
gametes A\B\, A\B 2 , A 2 B\ and A 2 B 2 in island 1 are 0.56,0.24,0.14 and 
0.06 and in island 2 are 0.02,0.08,0.18 and 0.72. Then linkage equilibrium 
holds within each island, but the overall gametic frequencies c\ = 0.29, C2 == 
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0.16, C 3 = 0.16 and c 4 = 0.39 are far from satisfying the linkage equilibrium 
requirement cj.c 4 — C 2 C 3 = 0 . 

This calculation shows that geographical structure can lead to linkage 
disequilibrium. The extent to which, in humans, linkage disequilibrium be- 
tween two loci might be due to geographical structure is controversial. This 
matter is discussed again in Section 10 . 8 , where tests of linkage based on 
association will be discussed. 

We turn now to stochastic behavior. One interesting class of problems 
concerns quantities that are not affected by geographical structure. For a 
finite population, Maruyama (1970, 1971, 1974) found two such quantities, 
at least for selectively neutral loci and certain genetic models. The first of 
these is the probability of fixation of a given allele, and the second is the 
mean total number of heterozygotes to appear as a result of a single new 
mutation. It then follows automatically from ( 8 . 6 ) that the mean time to 
fixation in the subdivided case is larger than that in the undivided case, 
and this was confirmed (Maruyama (1971)) by simulation. In the former 
case, on average, a smaller number of heterozygotes appears per generation, 
but for a greater number of generations, than in the latter case. 

The eigenvalues in a genetic model involving geographical subdivision 
were given in (3.126), and the consequent effective population size was 
noted in (3.127). Except for very small migration rates, this effective size 
does not differ much from the actual population size. We might then be 
tempted to conclude that in this model the effect of subdivision is not 
important, and that for many purposes the population can be taken as 
one large random-mating population. Whether this view is correct or not 
is relevant to the evolutionary theory of Wright, depending as it does to 
some extent on population subdivision. However, in the model leading to 
(3.126), the migration is isotropic, and a much less extreme conclusion 
holds for structured populations where migration is most likely to occur to 
and from neighboring sub-populations. 

Further eigenvalue questions have been discussed by Maruyama (1970, 
1971, 1972), Nagylaki (1974b, 1976, 1977c), and Kimura and Maruyama 
(1971). Kimura and Maruyama also note one further important result: 
Even in the selectively neutral case dines of gene frequency can occur, 
the propensity for this depending on the subpopulation sizes and migra- 
tion rates. Slatkin and Maruyama (1975) discuss the effect of stochastic 
gene frequency fluctuations on the slope of gene frequencies in a cline and 
show that the slope is decreased through such fluctuations. 

A second form of stochastic fluctuation occurs in infinite populations 
where the migration rate and the gene frequencies of immigrants into any 
subpopulation are random variables. We do not discuss this case in here: 
Details of a model analyzing it were given by Nagylaki (1979). 

A considerable literature also exists on the deterministic theory of 
geographically structured populations, originating with the remarkable pi- 
oneering paper of Fisher (1937). Of particular interest, in view of the theory 
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of Chapter 6, is the fact that association between frequencies of alleles at 
different loci can be generated by geographical subdivision, even in the 
absence of epistatic interactions between these loci. We consider here, how- 
ever, an example of two questions specific to geographic subdivision, namely 
whether selection can maintain a cline of gene frequencies from west to 
east if Ai is favored in the west and A 2 in the east, and second whether the 
frequency of A\ can be sustained at positive values when A\ is favored only 
in a finite interval of the entire east-west line. These questions have been 
considered in particular by Nagylaki (1975), and we follow his analysis of 
them closely. 

Consider the line [L, i?], where possibly L = -00 or R = +oc, and 
suppose at the point x on this line that the fitnesses of A\Ai, A 1 A 2 and 
A 2 A 2 are 1 + sg(x), 1 + hsg(x) and 1 — sg(x). (In this notation h — 0 
corresponds to no dominance.) The small parameter s is a measure of the 
strength of selection. Each individual is assumed to migrate, from the time 
of birth to the time of reproduction, by a random amount y, where y has 
a normal distribution with mean 0, variance a 2 . The migration distances 
of different individuals are assumed to be independent. The frequency p = 
p(x\ t) of A\ at the point x at time t then satisfies the partial differential 
equation 



% = + s9 ^ p ^ 1 ~ +h ~ 2hp ( 8 - ? ) 

together with the boundary conditions dp/dx = 0 at x = L, x = R. This is 
a generalization of the formula of Fisher (1937). If a stationary cline exists 
there must be a solution of (8.7) with dp/dt = 0. In the important case 
h = 0, (8.7) shows that the equilibrium cline equation is 

^ + ^g{x) p {l-p) = 0. (8.8) 

The appropriate solution to this equation is found by using the boundary 
conditions dp/dx = 0 at x = L, x = R. This equation always has the trivial 
solutions p(x) = 0, p(x) = 1, and our aim is to find conditions for nontrivial 
solutions. The form of any solution will clearly depend on the numerical 
value of the parameter 2s /a 2 as well as the nature of g(x). In the particular 
case where L = 0, R = + 00 , 5 > 0, and 



9(x) = 




0 < x < a 
a < x < oc 



(8.9) 



so that A\ is favored when x < a and A 2 when x > a, Nagylaki found that 
the necessary and sufficient condition for a unique nontrivial solution of 
(8.8) is 



aV2s > a arctan a. 



( 8 . 10 ) 
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Clearly the inequality (8.10) will apply for sufficiently large values of a 
and s and will not apply for sufficiently large cr, and, to a much smaller 
extent, a, since the right-hand side in (8.10) increases only from 0.78cr to 
1.57cr as a increases from 1 to oo, but increases linearly with the migration 
distribution standard deviation a. Clearly, while an inequality of the general 
form (8.10) is to be expected intuitively, the particular form of (8.10) is 
perhaps surprising, and indicates explicitly the relevance of the various 
parameters involved to the maintenance of A \ . 

This analysis can be taken over immediately to the case of a region, or 
“pocket”, in which A\ is favored. By reflecting the interval (0, a) about 
x — 0, we find that if 



g{x) = 




—a < x < a, 
\x\ > a, 



( 8 . 11 ) 



so that A\ is favored in the “pocket” (—a, a) but not elsewhere, the 
condition that A\ can be maintained in the population is again (8.10). 

Further analyses can be made for other functional forms for g(x), but 
we do not pursue the details. Analyses of this sort are relevant to Wright’s 
theory of evolution and also to questions concerning theories of allopatric 
and sympatric models of speciation, as discussed for example by White 
(1978). 

The above gives only a brief introduction to the complex theory sur- 
rounding geographically structured populations. Many further theoretical 
results are given by Nagylaki (1992), while Epperson (2003) provides a 
general discussion of the geographically structured case. These will be 
considered in more detail in Volume II. 



8.5 Age Structure 

So far all our analyses have ignored age structure. Perhaps curiously, the 
effect of age structure has been considered more frequently in the math- 
ematical ecology literature than in the mathematical population genetics 
literature. To take account of age structure one must specify age-specific 
reproductive and survival schedules for all genotypes of both sexes, and 
must also make assumptions concerning the sex ratio and the mating pro- 
cess. While Norton (1928) and Haldane (1927) considered age- structured 
populations many years ago, it is only recently that further attention has 
been paid to them in any detail (see, for example, Demetrius (1971, 1974, 
1975, 1976, 1977) and Charlesworth (1970, 1971, 1972, 1973, 1974)). An 
excellent summary of the topic is given by Charlesworth (1976). 

Perhaps the most important aim, for age-structured populations, is to 
establish natural definitions of fitness that allow much of the classical theory 
to be applied. In this direction, Charlesworth (1976) found that, under 
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certain conditions, a natural definition of the fitness of any genotype exists, 
and that with this definition, natural selection leads to equilibria of the form 
(1.31) in the overdominance case and to dynamical equations of the form 
(1.26) if one allele is becoming fixed in the population. Similarly, Demetrius 
(1974) has given an analogue of the mean fitness increase theorem for age- 
structured populations. It thus appears likely that age structure does not 
introduce radically new behavior in populations compared to that expected 
from classical analyses. For this reason, perhaps inappropriately, we do not 
consider it in any further detail in this book. 



8.6 Ecological Considerations 

There is now a vast literature on the mathematical theory of ecological pro- 
cesses, including static and dynamical theories of the growth of a number 
of interacting populations. May (1975), (1976) and Pielou (1974) provide 
summaries of this literature. In particular the discrete- time Lot ka- Volt err a 
equation 

Ni(t + 1) = Ni(t)( 1 + a z - Y, Pii N j(t)), (8.12) 

(i = 1 ,...,&), modeling the dynamics of a community of k populations of 
respective sizes Vi, . . . , Nk, has been extensively analyzed. While a con- 
siderable verbal discussion exists in the literature on the relation between 
population genetics and ecology, rather less mathematical theory exists 
on this relationship. Thus the model (8.12) as it stands is free of genetic 
considerations. 

However, if the parameters foj depend on the genetic constitutions of 
populations % and j, a description of the evolution in the model (8.12), 
concurrent with a description of the genetic evolution in the various pop- 
ulations, is possible in principle although no doubt normally difficult in 
practice. In this section we outline the analysis of Roughgarden (1976, 
1977) of such a joint model: For analyses of related models see Fenchel and 
Christiansen (1977), Jayakar (1970) and Yu (1972). 

Consider first the case of a single species in isolation and write (8.12) in 
the form 



N(t + 1 ) = N(t)w, ( 8 . 13 ) 

where w, the absolute mean fitness of the species, is a measure of the rate 
of increase in numbers of this species. Suppose that w is determined by the 
alleles A\ and A2 at a given locus. If A\ has frequency x, then from the 
elementary theory developed in Chapter 1, 

w = wi\x 2 + 2wi2x{\ — x) + 1 ^ 22(1 — £) 2 , ( 8 - 14 ) 

where Wij is the absolute fitness of A{Aj. The Wij themselves are assumed 
to depend on the current size N of the population. Thus, quite apart from 
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the evolution of the population size AT, there will be genetic evolution at the 
A locus determined by the standard equation (1.26). This equation, coupled 
with (8.13), then defines, for given functions Wij(N), the entire genetical- 
ecological evolution of the population. Anderson (1971) and more generally 
Roughgarden (1976) prove that this evolution is such that the equilibrium 
frequency x* of A\ is that producing the largest equilibrium population 
size and that also maximizes mean fitness. This is the first principle of 
genetical-ecological systems as enunciated by Roughgarden (1976). 

Consider next a set of k co-evolving species. Here two different forms of 
behavior occur. The first of these arises when the fitnesses for any species 
are not directly functions of the allele frequencies in other species, although 
they may so indirectly by depending on the sizes of the other species, 
which in turn are determined by these frequencies. In the second case these 
fitnesses do depend directly on the allele frequencies in other species. The 
analysis of the second case is quite complex, although Roughgarden has 
been able to give explicit principles governing its evolutionary behavior. 
Here we concentrate on the first case, for which 

AN z (t) = Ni{t){wi(N u . . . , N k ) - 1}, i = 1, . . . , *. (8.15) 

Here, as indicated, Wi depends on Ah . . . Ah, as well as on the frequency 
Xi of A\ in species z, but not directly on allele frequencies for species other 
than species i. Define the gradient matrix A — {a^} by 

a,ij = d(A Ni)/dNj at equilibrium 
aij = Nidwi/dNj at equilibrium. (8.16) 

It is necessary to introduce the feedback F of the system, defined by 

F = (-l) fe+1 |A|, (8.17) 

where | A\ is the determinant of A. A sub-community of order k — 1 may be 
defined by deleting species i from the system, and in this case the feedback 
in the sub-community is 

Fi = (-l) fc |M (8.18) 

where A* is obtained from A by striking out the zth row and zth column of 
A. For the equilibrium to be stable it is necessary that F < 0 and ih < 0. 

i 

Apart from the difference equation (8.15), there is also a second equation 
describing the genetic evolution in each population, namely 

A Xi = Xi(l - Xi){wn yi Xi + wi 2 ,i{l - 2xi) - w 2 2,;(1 - Xi)}/iDi, (8.19) 

where 

— Xj^Ww^i + 2Xi(l *^i)^12,i T (1 ^22, (8.20) 

and the wji^ depend, in a way we do not make explicit, on Ah, . . . , Nk. 

The joint ecological-genet ical evolution of the system is now determined by 
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(8.15) and (8.18) and has the following important equilibrium properties 
as presented by Roughgarden (1976, 1977). 

Principle 1. Suppose that, under the assumptions made above, there exists 
for any fixed x \ , # 2 , • • • , %k a unique locally stable equilibrium for the purely 
ecological model. Then an equilibrium point in the co-evolutionary model 
is locally stable if and only if Wi is maximized locally with respect to X{ at 
that point. 

This principle concerns gene frequencies. The second principle, stated 
below, concerns population numbers. 

Principle 2. Under the above assumptions, the equilibrium size of species 
i is either maximized or minimized, at a stable equilibrium, at the equilib- 
rium value of Xi. Maximization occurs if F{ < 0 and minimization if iq > 0. 
If Fi = 0 then the equilibrium size of species i is not affected by genetic 
evolution in that species. Further, Fi < 0 for at least one species in that 
system. This last result follows immediately from the condition ^ Fi < 0 
at a stable equilibrium. 

We do not prove these remarkable principles and note only the dual 
optimality of both genetic and ecological parameters at stable equilibria. 
Roughgarden gives particular examples of the application of these princi- 
ples, together with further generalizations, but we do not consider these 
less mathematical analyses here. 



8.7 Sociobiology 

Sociobiology is the study of the biological basis of social behavior. For this 
study to be meaningful it must be assumed that any behavior of interest 
has, at least in part, a genet ical basis. Some behaviors, if they are genet- 
ically based, pose particular problems for evolutionary theory, the most 
outstanding example being that of altruism. Genes for altruism, if they 
exist, are at an immediate selective disadvantage in the population and 
should then be presumably lost from the population. Indeed Wilson (1975, 
p. 3) has claimed that the central theoretical problem of sociobiology is to 
explain how altruism can evolve by natural selection, assuming that there 
is a genetic basis for this character. For variations on this theme see Wilson 
(1977). 

The sociobiological explanation for the existence of altruism, insofar as 
it is determined genetically, is through kin or group selection. While the 
behavior is disadvantageous to the individual exhibiting it, the altruistic 
act is sufficiently favorable to some small related or unrelated group so that 
the trait evolves by intergroup selection. In the Origin of Species Darwin 
also invoked intergroup selection arguments for similar traits. In this very 
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brief section we outline aspects of the theory on which this conclusion is 
based. For an excellent survey of group selection models, see Wade (1978). 

The major quantitative construct necessary for kin selection arguments is 
some measure of the degree of relatedness between two individuals. Various 
measures are available for this, the most commonly used perhaps being the 
“coefficient of kinship” . This is defined as the probability that, for a given 
locus, a gene drawn at random from one individual is identical by descent 
to a gene drawn at random from the second individual. For any given 
degree of relatedness, this coefficient may be calculated by a standard path 
analysis method due to Wright (1921). The quantification for the kinship- 
based theory goes back to Haldane (1932a) and Fisher (1958, p. 178), and 
has been developed in detail by Hamilton (1964) through the concept of 
inclusive fitness. Under this theory it is claimed that the altruistic act is 
favored, for relatives of a given degree, if the number of relatives of this 
degree who survive as a result of it exceeds the reciprocal of the coefficient 
of kinship between them and the altruistic individual. 

The relation between the inclusive fitness concept, altruism, multilocus 
evolutionary theory and evolution considered as an optimizing procedure 
is a controversial one. It has been discussed among others by Feldman 
and Cavalli-Sforza (1978, 1981), Grafen (1984), Hamilton (1996), Ham- 
merstein (1996), Marrow et al. (1996) and Schwartz (2002). This topic will 
be discussed at length in Volume II. 

It is interesting to observe that models for the evolution of altruism can 
be constructed, using group selection methods, where no concept of kin 
selection is invoked. Perhaps the most interesting of these models is that 
of Matessi and Jayakar (1976), which we now briefly describe. Consider 
an infinitely large population subdivided into finite groups of fixed size N. 
These groups are founded anew each generation in the following way. First, 
the entire population breeds at random in a common mating area and then 
splits up into groups of size TV, the membership of each group being entirely 
random. Suppose within a given group there are i “altruists” and N — i 
“non-altruists” and that the fitness of each altruist is then <Pa{^) and that 
of each non-altruist is <£w(i). The altruism assumption is that 

; = l,2,...,iV — 1. (8.21) 

The mean fitness of a group containing i altruists is now 

4>{i) = + (N — (8-22) 

If it happens that <j>{i + 1) > the existence of the altruists in the 

group favors the group as a whole, and if this advantage is sufficiently large 
compared to the disadvantage of altruists within each group, altruism will 
in some circumstances be favored. This argument is, at the moment, non- 
genetic but can be placed on a genetic basis by assuming certain genotypes 
for altruism. 
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The simplest possible forms for the functions (^(z), (j>N (i) are 

(f) A {i) = (3o + a 0 i, <j> N (i) = /?i -baqi. (8.23) 

The conditions (8.21) and 0(z + 1) > jointly require 

0 < (3 - (2N - l)a + N, (10.54a) 

—Na < (3 < —a + JV, (10.54b) 

where a = (au - a 0 )/ai, /3 = (/?i - /3o)/«i* 

For any given value of ]V, the conditions (10.54) define a set of a and (3 
values within which, under this model, altruism can be expected to evolve. 
This set is a convex region in the (a, /?) plane, and the smallest rectangle 
enclosing this set is 

-N/(N - 1) < a < N/(2N - 1), 0 < 0 < N 2 /(N - 1). (8.55) 

The area of the convex set relative to that of the rectangle defined by (8.55) 
defines a very crude measure of the “likelihood of altruism”. Despite the 
obvious limitations of this definition, it is interesting to observe that this 
measure decreases as a function of N from about 0.72 at N = 2 to 0.09 
at N = 20. Clearly, even though kin selection is not involved, this form of 
altruism can arise in small, albeit temporarily formed, groups. 




9 



Molecular Population Genetics: 
Introduction 



9.1 Introduction 

In the preceding chapters of this book the basic genetic unit was taken 
as the gene, and the basic numerical quantity was the gene frequency. 
In particular, the fundamental unit step in evolution was taken as the 
replacement of one gene (more strictly, allele) by another in a population, 
and static genetic polymorphisms were usually described in terms of forces 
acting on gene frequencies. While in Chapters 6 and 7 the point was made 
that these polymorphisms are better viewed through forces acting on sets 
of genes at many loci, it remains true that no genetic unit finer than the 
gene has yet been considered in this book. 

In this and the remaining chapters we consider the molecular population 
genetics theory arising from the recognition of the gene as a sequence of 
nucleotides. The task of placing population genetics theory on a molecular 
basis was begun by Kimura (1971); see also Nei (1975). To some extent 
the purely mathematical theory of the previous chapters carries through 
to the molecular level, with the nucleotide frequency replacing the gene 
frequency as the primary variable, but clearly, new models and viewpoints, 
as well as new “typical” values of various fundamental genetic parameters, 
are necessary at the molecular level. 

Nucleotide sequences up to essentially the entire genome level are now 
available for many species. This chapter considers the theory relating to 
such sequences, since the theory relating to amino acid sequences is compli- 
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cated because of problems concerning the genetic code and its redundancy 
properties. 

We now consider four points where the mathematical population genet- 
ics theory based on nucleotide frequencies differs from the classical theory 
based on gene frequencies. First, the molecular theory is dynamic, in con- 
trast to the often static classical theory. Mutations are usually seen as 
leading to new allelic types rather than back to currently or previously 
existing types, since it is plausible that most nucleotide mutations will 
lead to sequences not currently existing in the population. Both the in- 
finitely many alleles and the infinitely many sites models discussed in 
this chapter were originally proposed with this view in mind (Kimura and 
Crow (1964), Kimura (1969)). The dynamic nature of molecular population 
genetic models has been stressed in particular by Kimura (1971). 

Secondly, because of extremely small intracistronic recombination rates, 
perhaps of order 10 -5 or less, the assumption that the different sites within 
one gene evolve independently is particularly questionable. The mathe- 
matical theory of Chapters 6 and 7, there referring to genes and gametes 
rather than nucleotides and nucleotide sequences, shows that the evolution 
of tightly linked systems usually cannot be predicted from independent 
consideration of the separate loci (or, here, sites). Thus at the nucleotide 
sequence level various formulas in Chapters 3, 6 and 7 will be viewed differ- 
ently than at the gametic level. For example, if (3.138) is used to compute 
fixation probabilities of two-locus gametes, the assumption NR 1 may 
normally be made unless the loci are very close. In this case, the fixation 
probability (3.138) for gamete i becomes, essentially, 

Ci(0) + THD(0). (9.1) 

This is just the product of the probabilities of fixation of the two alleles 
that make up gamete i, so that the fixation processes at the two loci are 
effectively independent. At the molecular level, on the other hand, it is pos- 
sible that NR is small, since we might well be considering two nucleotides 
within the same gene, or cistron, and in this case the fixation probability 
for “gamete i” is close to q( 0). Thus each two-site “gamete” evolves largely 
as a unit, and the fixation processes at the two sites are closely associated. 
Clearly the likely numerical values of the parameter R in the two cases af- 
fect the way in which (3.138) can be used to assess properties of concurrent 
nucleotide and gene fixation processes. 

Thirdly, while the classical theory concerns the evolution of genes given 
labels “Ai”, “A 2 ”, etc., at the molecular level the actual genetic material is 
known, so that the symbols a, g , c, and t refer to specific rather than type 
entities. The fact that the theory thus concerns ultimate and real entities is 
of great importance, and further reference will be made to it in a moment. 
It also allows evolutionary inferences not closely associated with classical 
population genetics theory. For example, the considerable redundancy of 
the third nucleotide of a triplet in determining amino acids has been used by 
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Kimura (1977), Cornish-Bowden and Marson (1977), Barker et al., (1978), 
and Berger, (1978) for this purpose. We do not pursue these developments 
here. 

Finally, and perhaps most important, molecular considerations often lead 
to retrospective rather than prospective evolutionary questions. The great 
work of Fisher, Haldane, and Wright was largely prospective: Given rea- 
sonable numerical values for various genetic parameters, they showed that 
evolution as a genetic process could and would occur. A hundred years ago 
such an undertaking was required. It is, however, no longer necessary to do 
this, and it now appears more useful to attempt to describe the course that 
evolution has taken by a retrospective analysis, and thus to gain empirical 
insight into evolutionary questions. This change of viewpoint has also led 
to the introduction of statistical methods for analyzing current genet ical 
data, considered briefly in Chapters 10, 11, and 12. These matters will be 
discussed in greater detail in Volume II, taking up far more realistic cases 
than are considered in this book. The current emphasis on statistical infer- 
ence procedures is perhaps the most important new direction in the theory 
in recent times. Knowledge of the actual genetic material is essential for 
these inferences, and the entire retrospective analysis must therefore be 
carried out in the framework of molecular population genetics. 



9.2 Technical Comments 

As stated above, two frequently used population genetic models have been 
inspired by the knowledge of the molecular structure of the gene, namely, 
the infinitely many alleles model and the infinitely many sites model. Pop- 
ulation properties of these models are discussed in Sections 9.3 and 9.4 
respectively. Our main interest is in properties of samples under both mod- 
els, and these are discussed in Sections 9.5, 9.6, and 9.7. Various “time” and 
“age’'’ properties are discussed in Section 9.9. Unless stated otherwise, se- 
lective neutrality, stationarity, and a constant population size are assumed 
throughout. The last assumption is clearly inappropriate for the human 
population, and models allowing for this expansion are a major topic of 
current research. So far as the assumption of neutrality is concerned, tests 
of this assumption are discussed in Chapter 11. 

The sample size is denoted throughout by n (genes) and the population 
size by N . Since a diploid population is assumed, the number of genes in the 
population is 2 AT. Because we often compare sample and population prop- 
erties, we depart in this chapter from previous notation and write suffixes 
“n” and “2 V” when appropriate (for example, K n and K 2 n) to distinguish 
between sample and population quantities, respectively. (In later chapters, 
where sample properties only are discussed, we do not use a suffix.) 
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So far as the infinitely many sites model is concerned, because our main 
interest is in a collection of sites within a single gene, the assumption of no 
recombination between sites is the most appropriate one. This assumption 
is then made throughout this chapter when the infinitely many sites model 
is considered, except where otherwise stated. The theory for this case is 
largely due to the outstanding pioneer paper by Watterson (1975), to which 
we refer often. When there is no recombination between sites, the infinitely 
many sites model may also be viewed as an infinitely many alleles model, 
a connection that we explore below. 

Much research currently centers around single nucleotide polymorphisms 
(SNPs). The infinitely many sites model is appropriate for the analysis of 
these. The classic definition of a polymorphism, given by Harris (1980, p. 
331) in the context of protein polymorphism, is that a locus is polymorphic 
if the population frequency of the most frequent allele in the population 
of interest is no more than 0.99. However, this definition is, of course, 
arbitrary, and is not always implicit in published SNP polymorphism cal- 
culations, especially since observed SNP data refer to a sample, whereas 
the definition of polymorphism given above refers to a population. In this 
chapter we give calculations connecting sample data and the probability of 
population polymorphism. 

Within each section, properties of the Wright-Fisher model, the nonover- 
lapping generations Cannings model, and the Moran model are discussed. 
Despite its importance, little work is available in the literature on the 
Cannings model, and it is assumed throughout that formulas for the 
nonoverlapping generation form of this model are close in form to those 
for the Wright-Fisher model, with an appropriate change in the definition 
of the parameter 0, which occurs in many formulas. For Wright-Fisher 
models the interpretation of this parameter is 0 = 4 Nu, where u is the per- 
gene mutation rate, assumed to be small and in the diffusion approximation 
of order N~ l . When Wright-Fisher formulas are used for nonoverlapping 
generation Cannings models, the interpretation is 9 = ANu/a 2 , where a 2 is 
defined in Section 3.3. These notational conventions are assumed through- 
out this chapter, without any further comment. For the Moran model the 
definition of 9 is 2Nu/(l — u ). The definition of u is straightforward in the 
infinitely many alleles model but less straightforward in the infinitely many 
sites model: The definition of u for this model is that u is the probability 
that there is at least one mutant nucleotide in any newborn in the DNA 
sequence under consideration. 

Formulas for Wright-Fisher models, and thus for Cannings models, are 
all diffusion approximations, while those for Moran models are often exact. 
This has one important consequence. The diffusion process corresponding 
to the Wright-Fisher model, and the discrete Moran model process itself, 
are both time-reversible. This implies that many results found by going 
forward in time have an interesting interpretation going backward in time, 
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and conversely; one in effect gets two results for the price of one for these 
two processes. 

Diffusion formulas concerning time or ages are usually given in diffusion 
time units. For the Wright-Fisher model, for example, the time unit is 2N 
generations. On some occasions it is convenient to convert diffusion times 
to generations, and on other occasions they are best left in diffusion time 
units. 

Finally, we repeat a comment made above, that the theory considered 
here is introductory and does not consider complications due to variable 
population size, geographical subdivision, and so on. These complications 
must be taken into account in any significant data analysis, and will be 
addressed in detail in Volume II. 



9.3 Infinitely Many Alleles Models: Population 
Properties 

9.3.1 The Wright-Fisher Model 

The neutral Wright-Fisher infinitely many alleles model was introduced 
and in part discussed in Sections 3.6 and 5.10. For example, in Section 
5.7 a very close approximation for the monozygosity probability P m ono was 
found, and in Section 5.10 various other infinitely many alleles results, for 
example (5.123) and (5.124), were also described. In this section we discuss 
this model further. As noted in Section 9.2, the notation 0 = ANu will be 
used, with the definition 9 = ANu/ a 2 applying when Wright-Fisher results 
are used for the Cannings model. Essentially all results given are diffusion 
approximations. 

We first consider the number K 2 N of alleles present in the population 
at any one time. If K 2 N — 1, the population is monomorphic (see (5.66)), 
and it was shown in Section 5.7 that the expression in (5.82), which is the 
diffusion approximation (3.96) with the sample size n formally replaced 
by the population size 27V, provides an excellent approximation for the 
probability of monomorphism. We have no right to expect this to occur, 
since the analysis of Section 3.6 assumes a sample size far less than the 
population size. We take up this point further in Chapter 10. 

Of perhaps greater interest than the probability of monomorphism is the 
probability of population polymorphism as defined (by Harris) in Section 
9.2. The calculations in (5.63) show that this probability is 

Probability of population polymorphism = 1 — (0.01 ) 6 . (9.2) 

If the Harris value 0.99 is replaced by the general value 1 — for some 
small <5, then (9.2) is replaced by 

Probability of population polymorphism = 1 — 5 6 . (9.3) 
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We turn next to other properties of K 2 n. No exact formula is known for 
the mean of K 2 n , although the diffusion approximation to it is given in 
(3.92). More detailed information is provided by (3.93). Apart from this, 
little is known about the complete distribution of K 2 n or the frequencies 
of the alleles present. Fortunately, our interest is mainly in properties of 
samples of genes from the population, rather than in the population itself, 
and here substantial information on sample numbers and frequencies is 
available, as discussed in Section 9.5. 

Certain properties of the neutral infinitely many alleles model can be 
found immediately from the “two- allele” theory of Chapter 5. This is pos- 
sible because, in the infinitely many alleles case, all alleles other than A\ 
can be grouped simply as the “allele” “not-Ai”, and often this is sufficient 
to answer certain “infinitely many alleles” questions. Thus two-allele the- 
ory leads to the infinitely many alleles model probability (5.63), and several 
similar results are available. 

For example suppose, in the Wright-Fisher two-allele model (3.16) with 
mutation from A\ to A 2 at rate u, but with no reverse mutation, that 
the allele A\ has a current frequency of unity. The mean time until A\ is 
lost from the population can then be found immediately from (3.18) and 
(3.19), with p = 1. In the infinitely many alleles case, we can use these 
results by identifying A\ with all the alleles initially in the population and 
A 2 with all new mutant alleles. From this we can find the mean time, in 
the infinitely many alleles model, until all the original alleles are lost from 
the population. The resulting mean time is, approximately, 

CO 

AN ^{j(j + 6 — l)} -1 generations, (9.4) 

3 = 1 

as was given in (3.23). A slightly more accurate approximation is 

2 N 

AN^^{j(j -F 9 — l)} -1 generations. (9.5) 

3 = i 

The case 9 = 2 is of some interest. For this value of 6 the expression in 
(9.5) reduces to 

4A^ — 2 (9.6) 

generations. This is identical to the conditional mean fixation time given in 
(5.36), which in turn is identical to the conditional mean loss time, given 
initially 2 N — 1 genes of the allele A\. The reason why the unconditional 
mean time in the mutation process and the conditional mean time in the 
nonmutation process are essentially identical for the case 9 = 2 can be seen 
from the fact that in the two corresponding diffusion processes, the drift 
and diffusion coefficients a(x) and b(x), given respectively in (5.61) and 
(4.58), are identical. Identical arguments show that the same mean time 
applies when there is a single initial A\ gene when the condition is made, 




294 9. Molecular Population Genetics: Introduction 



in the no mutation case, that Ai eventually fixes in the population. This 
mean time then has the interpretation as the mean time back to the most 
recent common ancestor gene of all genes in the current population, as we 
observe in the discussion surrounding (10.6). 

We return to the expression (9.5) in Chapter 10, where it will be 
shown that the individual terms in (9.5) have an important interpretation 
regarding the past history of the population. 

Returning to the case of a single allele A\ with initial frequency 1, a cal- 
culation generalizing that leading to (9.5) can be made for selective models. 
Ohta (1974, 1976) has claimed that most gene fixation processes in evo- 
lution concern very slightly deleterious alleles. Consider then an infinitely 
many alleles model in which a given allele A\ has initial frequency 1. We 
suppose that A\A\ individuals have fitness 1, that all A\Aj heterozygotes 
have fitness 1 — s, and that all other genotypes have fitness 1 — 2s. The 
mean time until one or other deleterious allele fixes must exceed the mean 
time until loss of A \ , and the latter mean time may be found immediately 
from two- allele theory using a generalization of (3.19) (see Ewens (1969c, 
equation (5.39)), Li and Nei (1977, equation 1). If a = |2iVs|, this mean 
time is, in generations, 



T( 1) = 2N / t(x) 
o 



dx, 



(9.7) 



where 



t(x) = x 1 (l — x) e 1 exp(2ax) J (1 — y) 9 exp(— 2ay) dy. (9.8) 

o 

This mean time is calculated by Li and Nei (1977) for various (0, a) com- 
binations. As expected, it is extremely large even for moderate values of 
a, increasing (for 9 = 1) from 40 N generations for a = 2.5 to 5 x 10 6 iV 
generations for a — 10. We conclude that the evolutionary role of these 
recurrent deleterious mutants is negligible if a is 5 or more. 

Much interest now centers on retrospective properties of this and other 
models, as well as on “age” properties of the alleles in a population. These 
are discussed in Section 9.9. 



9.3.2 The Moran Model 

The Moran infinitely many alleles model was introduced in Section 3.6.4. 
In this section we consider further results for this model, focusing on the 
“age” and “time” results of interest in this chapter. As noted above, the 
definition of the parameter 9 for the Moran model is 9 — 2Nuj(\ — u), and 
this definition applies throughout this section for this model. 




9.3. Infinitely Many Alleles Models: Population Properties 295 

If the population is monomorphic we say that the single allele present in 
the population is “quasi-fixed” . We do not use the expression “fixed” , since 
in an infinitely many alleles model this allele will eventually be lost from 
the population. Kelly (1976) has shown, for the Moran model, that the 
probability that a new mutant allele becomes quasi-fixed in the population 
is C~\ where 



We now consider the mean number of birth and death events until all 
alleles present in the population at any time are lost. The value given in 
(9.5) for this mean is a diffusion approximation, applying for the Wright- 
Fisher model. In the case of the Moran model an exact calculation can be 
made by using the results of Section 3.4, regarding all the alleles in the 
population as A\ and with initially 2 TV genes of this allelic type in the 
population. Watterson (1976a) found that the required mean number of 
birth and death events is 

2 W( 2 W+W -i)-.fr-( 1 -( 2 ;)( 2W+ /- 1 )‘^ euo, 

(A formula different from (9.10), found by applying FHopital’s rule, applies 
for the case 9 — 1.) In the case 9 — 2, the expression (9.10) gives, exactly, 
SN 2 (N + l)/(2 N + 1), or about 47V 2 , birth and death events. This can be 
thought of as corresponding to 4 N “generations” , which appears to agree 
closely with the Wright-Fisher approximation in (9.6). This agreement is, 
however, misleading, since the definitions of 9 differ in the two models. 

We make several further comments about (9.10). First, as with the cor- 
responding result for the Wright-Fisher model, we may think of (9.10) as 
providing, in this case exactly, the mean age of the oldest allele in the 
population. Second, the typical (jth) term in (9.10) is the mean number 
of birth and death events for which there are exactly j genes present of 
the various original alleles in the population before the eventual loss of all 
these alleles. Thus the expression (9.10) gives more information than might 
otherwise be thought. 

Third, although the identity is not immediately obvious, the expression 
in (9.10) is identical to the expression 



2 N 

2N(2N + 6) 

i = i 



1 

j(j + 9-l) 



( 9 . 11 ) 



We shall see in Chapter 10 that the individual terms in the sum also have 
an important interpretation, in this case concerning the past history of the 
population rather than its future evolution. 
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The expression in (9.11) may be written equivalently as 



2 N 



E 



1 

Vj + Wj ’ 



(9.12) 



where 



ju 






2 AT 



Wo 



j(j - CU ~ u) 

(2N) 2 



(9.13) 



In Chapter 10 we shall explain why the mean age of the oldest allele can 
be expressed in the form defined by (9.12) and (9.13). 

Finally, the Moran population model configuration process is reversible 
(Kelly, 1976). In other words, if we write down the possible states E\, 
£* 2 , . . . , E p{2N) of the configuration process and the transition probabilities 
between them, (2.164) holds true when we interpret fa as the stationary 
probability of the ith configuration. Thus the prospective and retrospective 
behaviors of the process are identical, a fact we shall take advantage of 
later in discussing the past history of the population. The time reversibility 
property is what allows an interpretation for the terms in the sum in (9.11) 
in relation to the past history of the population. 

The exact frequency spectrum (3.102) provides, almost immediately, two 
results of interest in this chapter. The first uses the concept of size-biased 
sampling, discussed in more detail Section 9.9. In the Moran model the 
probability that an individual drawn at random is of an allelic type having 
exactly j copies in the population is found by multiplying the jth term in 
(3.102) by j/(2N). This gives a value of 

for this probability. This calculation will be of use later when we consider 
“age” properties of the alleles in the population. 

Second, (3.102) allows an exact calculation of the probability of popula- 
tion polymorphism, as defined in Section 9.2. Any allele having a frequency 
exceeding 0.99 must be the most frequent allele in the population, and at 
most one allele can have such a frequency. Thus the probability that the 
most frequent allele in the population has frequency exceeding 0.99 is the 
mean number of alleles with frequency exceeding 0.99. Taking 0.99(2iV) as 
an integer M, (3.102) shows that the probability of polymorphism is 



-sAorrry -» 

This is close to 1 — (0.0 1) 61 , the approximate value found above for the 
Wright-Fisher model using a diffusion approximation. As with other such 




9.4. Infinitely Many Sites Models: Population Properties 297 



calculations, this apparent similarity is misleading because of the different 
definitions of 6 in the two models. 

Many further exact and elegant results can be found for the Moran model, 
but since our main interest is in samples of genes from a population rather 
than the entire population itself, we do not consider these further. 



9.4 Infinitely Many Sites Models: Population 
Properties 

9,4-1 Introduction 

We turn now to the infinitely many sites model. This was in effect intro- 
duced by Kimura (1969) but only named as such by him later (Kimura, 
(1971)). In this model a gene is thought of as a long sequence of nucleotides 
and an allele refers to some specific such sequence, so that different allelic 
types are just different sequences. In using the words “gene” and “allele” 
when referring to this model we imply these definitions. 

In this model a mutation is simply the change of one nucleotide type to 
another, and it is assumed in the model that any mutation arises at a site 
currently monomorphic in the population. (The concept of “infinitely many 
sites” is intended to formalize this assumption.) To compare infinitely many 
sites formulas with infinitely many alleles formulas we define a newborn to 
be a mutant if there is at least one mutant site in the newborn, and write 
the probability of this event as u. 

Various formulas depend only on the mean number of mutant sites in the 
newborn, but other formulas depend on the complete distribution of the 
number of mutant sites. In his pioneering paper, Watterson (1975) assumed 
that the number of mutant sites in any newborn has a Poisson distribution 
with parameter is. Following the definition of a mutant for the infinitely 
many sites model given in Section 9.2, the probability that a newborn is 
a mutant for this Poisson case is is u = 1 — e~ v . Here we introduce the 
general model for which 

Prob (j mutant sites in any newborn) = qj. (9.16) 

For this model the probability u that a newborn gene is a mutant is 1 — go- 

We follow the Watterson notation and denote the mean number YllQj 
of mutant sites in any newborn by is. Since “sites” mutation rates are very 
small, u and v differ only by small-order terms, and in the diffusion approx- 
imation u and is may be used interchangeably. We repeat two comments 
made above, first that Wright-Fisher model results are diffusion approxi- 
mations and Moran model results are often exact, and second, that unless 
otherwise stated, it is assumed that there is no recombination between sites. 
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9-4-2 The Wright-Fisher Model 

In any infinitely many sites model the mean number of new sites at which 
segregation starts in the population in each generation is 2IW. For the 
Wright-Fisher and Cannings models, where we use diffusion approxima- 
tions, we replace this by 2 Nu. At stationarity the new alleles created by 
mutation will be balanced by an equal number of alleles that, because of 
random sampling and mutation, become lost from the population. 

For this model the analogy of the formal mathematics of the segrega- 
tion process at each site to that of the segregation process at each locus in 
classical genetics is particularly interesting. For example, since for the mo- 
ment we assume selective neutrality, we can use the neutral Wright-Fisher 
model (1.48) to describe the segregation process at each site, since under 
our assumptions at most two nucleotide types are possible in the popula- 
tion at any site at any time. From (5.18), the mean number of generations 
for which the mutant nucleotide assumes the value j is 2/j. Since on av- 
erage 2Nu sites begin segregating in each generation, the mean number of 
sites at stationarity at which, at any time, there are j representatives of 
the mutant nucleotide is 



4 Nu 

j 




,21V -1. 



(9.17) 



Passing to a continuous approximation, we see that the mean number of 
generations for which the population frequency of the mutant nucleotide 
assumes a value in (x,x + Sx) is 2 x~ l 8x, ((2A r ) _1 < x < l). Thus at 
any time the mean number of sites (p(x)Sx at which the mutant nucleotide 
assumes a value in (x, x + 6x) is 



4>(x)Sx 



4Nu 

x 



Sx = — Sx . 
x 



(9.18) 



We may call (9.17) in the discrete case, or (9.18) in the continuous case, the 
population frequency spectrum of the process. The stationary mean of the 
number S 2 n of sites segregating in the population at any time may be found 
by integrating the function 1— x 2N — (1— x) 2N with respect to the expression 
9x~ l given in (9.18) over the interval ((2N) -1 ,l). This calculation yields 
the value 01og(2N); a more accurate expression (see (5.77)) is 



E{S 2 n) = 6>log(2 N) + 0.67750. (9.19) 



This mean value applies for any recombination structure between sites. 
However, further properties of the distribution of S 2 n depend on this re- 
combination structure. If we were to assume unlinked sites and that the 
segregation processes at the various sites are independent, the number of 
segregating sites in the population would be, to a close approximation, a 
Poisson random variable with mean, and also variance, as given in (9.19). 
This would imply that the probability P mon o of no segregation at any site 
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would be 

(2N)~ e e~°' 6775d . (9.20) 

Clearly, however, the independence assumption is quite unjustified for the 
case we consider, that is, where there is no recombination between sites. For 
the no recombination case, the infinitely many sites model model reduces to 
the infinitely many alleles model. For this model the expression (5.82) gives 
a close approximation to P mo no . The ratio of the probabilities in (5.82) and 
(9.20) is 

e 70 r(l + 0 ), (9.21) 

where 7 is Euler’s constant 0.577216 This ratio differs from 1 by terms 

of order 9 2 when 9 is small. 

Watterson (1975) derived further properties of S 2 N under the assumption 
of no recombination between sites. Most of these relate to samples and are 
thus discussed in Section 9.6. For the entire population he found that the 
variance of S 2 N is approximately 

var(5 2 Af) = E (S 2 n) + (9.22) 

and that for large TV, the complete distribution of S 2 N is approximately 
Poisson with mean (9.19). The variance (9.22) differs from that arising if 
independence between sites is assumed, namely E(52 tv), by terms of order 
9 2 . This remark and the observation following (9.21) suggest that when 6 is 
small, properties of the “no recombination” and the “free recombination” 
models are quite close. For general values of 9 and with a small recombina- 
tion fraction between adjacent sites, we expect the variance of S 2 n to he 
between E(S 2 n) and the value given in (9.22). If this is so, the value of the 
“free recombination” variance calculation is that it provides a lower bound 
to the variance of S 2 N when some recombination between sites is allowed. 

Provided that the same fitness structure holds at all sites and that the 
stochastic processes at the various sites are assumed independent, the cal- 
culations leading to (9.19) can be generalized to take selection into account. 
Perhaps the most interesting selective scheme is that in which the mutant 
nucleotide is at a slight selective disadvantage s (< 0) with respect to the 
prevailing type, with no dominance. Here (5.48) shows that the right-hand 
side in (9.18) should be replaced by 

0{x(l - x)}- 1 {e a(1 “ a:) - l}{e“ - lj-'dx, (9.23) 



where a = |47Vs|. In principle this allows a calculation of the mean number 
of segregating sites in the population through an evaluation of the integral 




9{x( 1 



-x)}- 1 {e Q(1 “ l) 



l}{e' 



1} 1 dx, 



but unfortunately, no simple explicit form for this integral exists. 
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9.4-3 The Moran Model 

As for the infinitely many alleles case, the Moran model admits several 
exact formulae in the infinitely many sites model. The first such formula 
that we discuss is that for the mean of the number S 2 n of segregating sites. 
An ergodic argument similar to that leading to (3.102) shows that this is v 
times the mean time that segregation continues at any site. This mean time 
is given exactly by (3.51) with i — 1, and this leads to the exact expression 

mean of S 2 n = 2Nv(^j + 1 H + 2A^l)' ( 9 - 24 ) 

Equation (9.24) contains more information than initially appears, since 
the mean number of segregating sites for which the mutant nucleotide is 
represented exactly j times in the population is 2 Nvj~ l . This is the exact 
Moran model analogue of the Wright-Fisher approximation in (9.17). 

We now consider the exact “monomorphism” probability that S^jv = 0. 
This depends on the nature of the input mutation process assumed. Wat- 
terson (1975) found that in the Poisson mutation case, the monomorphism 
probability is 



(2 TV - 1)! 

(1 + 9) (2 -f 0) • ■ • (27V — 1 + 0) ’ 



(9.25) 



where 9 is defined by 



9 = 27V (e" - l). (9.26) 

This definition is in line with the general definition of 9 — 2Nu/(l — u) for 
the Moran model since, in the Poisson case, u — 1 — e -l/ , and using this 
value for u in the general definition of 9 we recover the expression in (9.26). 

For the general mutation model (9.16) it can be shown that (9.25) holds 
if 9 is defined by 



0=2 n(- — = ( 9 . 27 ) 

V qo / 1 — u 

In the infinitely many sites model without recombination, the event that 
S 2 N = 0 is identical to the event that there is only one allele in the pop- 
ulation, and it is therefore not surprising that the expression in (9.25) is 
identical to the infinitely many alleles expression (3.99) under the definition 
we assume for a mutant gene in the infinitely many sites model. 

The fact that the coefficient 27V v on the right-hand side in (9.24) is in 
general different from 9 leads to the question of when 9 — 27V v. Equation 
(9.27) shows that these two quantities are equal when 

1 - go 

Qo 



v — 




9.5. Sample Properties of Infinitely Many Alleles Models 301 



This will occur if the number of sites at which a mutation occurs in any 
newborn has the geometric distribution 

Prob (j mutant sites in newborn) = (1 — u)u J , j — 0, 1, 2, — (9.28) 

Although this is less natural than the Poisson distribution for the input 
mutation process, we shall see later that the model (9.28) has interesting 
exact properties for the Moran infinitely many sites model. 



9.5 Sample Properties of Infinitely Many Alleles 
Models 

9.5.1 Introduction 

Any data concerning the genetic composition of a population derives from 
a sample from that population. Since these samples are central to the con- 
cept of the coalescent (Chapter 10), are used to test for selective neutrality 
(Chapter 11), to answer questions of interest to human geneticists (Section 
10.8), and to investigate the phylogenetic relation between a number of 
species (Chapter 12), it is appropriate to consider sampling properties in 
detail. In this section we do this for various infinitely many alleles models. 
Parallel results for infinitely many sites models are given in Section 9.6. 
The infinitely many sites model is a natural one to consider in detail, since 
it is used to describe the stochastic behavior of DNA sequences. On the 
other hand, the infinitely many alleles model is sometimes used to model 
the evolutionary behavior of haplo types, a topic of much current inter- 
est. (However, see the discussion in Section 9.7 concerning pitfalls in this 
procedure.) 

Throughout this chapter, the number of genes in the sample is denoted 
throughout by n, and it is assumed that n is far smaller than the number 
of genes (2N) in the (diploid) population of size N. 



9.5.2 The Wright-Fisher Model 

We consider first the Wright-Fisher infinitely many alleles model. The dif- 
fusion approximation properties of a sample of n genes under this model 
are best summarized through the partition formula (3.83). This leads to 
the distribution of the number K n of different allelic types observed in the 
sample as given in (3.84) and thus to the mean of K n as given by (3.85). 
While (3.83) and thus (3.85) were found in Section 3.5 by using recur- 
rence relations, the mean (3.85) can be found directly (see (3.94)) from the 




302 9. Molecular Population Genetics: Introduction 



frequency spectrum (j)(x) in (3.95), using the calculation 



i 

E (K n ) = J{1 - (1 -x) n }<j){x)dx. (9.29) 

0 



We return to this comment below when considering time- dependent 
properties of this model. 

There is currently much interest in estimating the parameter 0. Equations 
(3.83) and (3.84) show jointly that the conditional distribution of the vector 
A = (Ai, A 2 , . . . , A n ) defined before (3.83), given the value of K n , is 



Prob{A = a| K n = k} 



n\ 

|S£| l ai 2° 2 • • • n an ai!a 2 ! • * • a n l ’ 



(9.30) 



where a = (ai, a 2 , . . . , a n ). 

Equation (9.30) implies that K n is a sufficient statistic for 6. Standard 
statistical theory then shows that once the observed value k n of K n is given, 
no further information about 0 is provided by the various aj values, so that 
all inferences about 0 should be carried out using the observed value k n of 
K n only. This includes estimation of 0 or of any function of 0. 

Since K n is a sufficient statistic for 0 we can use the probability distri- 
bution in (3.84) directly to find the maximum likelihood estimator 0k of 
0. It is found that this estimator is the implicit solution of the equation 



K = d x+ ° K .+ 

" 9k Ok + 1 0 K + 2 



+ 



+ — 



0 



K 



0K + n — 1 



(9.31) 



Given the observed value k n of K n , the corresponding maximum likelihood 
estimate 0k of 0 is found by solving the equation 



Ok + Ok 

Ok Ok + 1 Ok + 2 §k + n — 1 



(9.32) 



Numerical calculation of the estimate Ok using (9.32) is usually necessary. 

The estimator implied by (9.31) is biased, and it is easy to show that 
there can be no unbiased estimator of 0. On the other hand, there exists an 
unbiased estimator of the population homozygosity probability 1/(1 + 0). 
If this estimator is denoted by g(K n ), (3.84) shows that 

^ iff|g fc g(fc) _ i 

^ s n (o) i + r 

where \S^\ is the absolute value of a Stirling number, defined below (3.84). 
From this, we see that 

n 

E = 9(9 + 2 )(9 + 3) • • • (6 + n - 1) . 

fc=l 
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Since this is an identity for all 9 , the expression for g(k) for any observed 
value k n of K n can be found by comparing the coefficients of 6 k on both 
sides of this equation. In particular, when k n = 2, 



3 ( 2 )- 



I + I + 






1 + ^ \ + 



' • + 



n — 1 



(9.33) 



Unbiased estimation of 1/(1 + 9) for values of k n larger than 2 is compli- 
cated, and it is then probably more convenient to use instead the estimator 
(1 + 9k)~ 1 , where 9k is found from (9.31), even though this estimator is 
slightly biased. 

It is sometimes preferred to estimate (1 + 9)~ l by /, defined in the 
notation of (3.88) by 

/=E|- ( 9 - 34 > 

i 



This is a poor estimate in that it uses precisely that part of the data that is 
least informative about (1 +#) -1 . The estimate of 9 derived from /, namely 



Of = r 1 - 1, (9.35) 

has been shown (Ewens and Gillespie (1974)) to be strongly biased and to 
have mean square error approximately six or eight times larger than that 
of 9. 

More generally, the only functions of 9 allowing unbiased estimation are 
linear combinations of functions of the form 



{(a + 6)(b + 3) • • • (c + $)} _1 , (9.36) 



where a, b , . . . , c are integers with 1 < a < 5 < • ■ • < c < n — 1 (Ewens 
(1972)). While this fact derives mathematically from the form of the prob- 
ability distribution (3.84), an argument in support of it, from an empirical 
sampling point of view, is as follows. 

Suppose, for example, that k n — 2 and write the unordered numbers 
of genes of the two alleles observed as N\ and n — N\. The probability 
distribution of the pair {N\,n — N\) is identical to that of Ah, and (9.30) 
shows that this is 



Prob(A r i = rii) — 



n\ 

I^K(n-ni) 



(9.37) 



Given the observed values n\ and n — ni, the probability that two genes 
taken at random are of the same allelic type is 



fi 1 ) + (V) 
© 



Multiplying this expression by the right-hand side in (9.37) and summing 
over all possible values of n\ gives the estimator (9.33). A similar argument 
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can be used to justify the fact that any function of the form in (9.36) admits 
unbiased estimation. 

We now consider an approximation for the mean square error (MSE) of 
the estimator 9k as defined by (9.31). Writing the right-hand side of (9.31) 
as we have K n = iI>(§k) and also, from (3.85), E(j K n ) = ip (9). Thus 

by subtraction, 

K n -E(K n )=1>(§K)-il>{0). 

A first-order Taylor series approximation for the right-hand side is (9k — 
0)ip'(6), so that 

K n -E(K n )*(9 K -9)^(9). 

Squaring and taking expectations, we get 

MSE( ^)«^r- ( 9 . 38 ) 

The variance of K n is given in (3.86), and it is immediate that 

« e > = £( 077 ) 3 - < 9 - 39 > 

This leads to 

MSE(^) * (9.40) 

^3 = 1 (j+0) 2 

The approximation (9.40) appears to be quite accurate, and we use it in 
Section 9.6 in comparing estimation of 9 in the infinitely many alleles and 
the infinitely many sites models. 

Griffiths (1979a,b) has found many time-dependent properties of the 
number and frequencies of alleles observed in a sample of n genes. These, 
of course, depend on the initial population frequencies chosen as well as 
on the mutation rate. At one extreme one can assume that initially only 
one allelic type exists in the population, and at the other extreme that 2N 
allelic types exist in the population. Many of these properties are found 
using the time-dependent frequency spectrum (j> t {x ), which has the form 

oc 

(j>t(x) = &r -1 (l -x) 0_1 ^l + y^A i (f)t/; i (x, 6 >))ft(pi,p 2 , ■••))• (9.41) 

i—2 

In this expression the A i(t) are eigenvalues whose values are given below, 
ipi(x, 9) is a function only of x, i, and 0, and gi{p\,P 2 , , . . .) is a complicated 
function of the initial allelic frequencies • The rate of conver- 

gence of this frequency spectrum to the stationary spectrum 9x~ 1 (l— x) e ~ l 
depends on the eigenvalues Aj(t), which are given by 

A i(t) = exp (— lj(j - 1 + 0)t), j = 2,3,4,..., 



(9.42) 
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(ii) 


(i) 


(ii) 


(i) 


(ii) 


(i) and (ii) 


0.1 


1.31 


10.12 


1.40 


4.62 


1.47 


2.77 


1.57 


1.0 


4.03 


12.39 


4.89 


7.64 


5.49 


6.34 


5.88 


1.5 


5.51 


13.62 


6.74 


9.25 


7.54 


8.18 


7.90 



Table 9.1. Mean number of alleles observed in a sample of 200 genes for various 9, 
t values. Unit time = 2 N generations. Case (i): one initial allele. Case (ii): many 
initial alleles of equal frequency. From Griffiths (1979b). 

and in particular on the largest eigenvalue exp{— (1 -f 0)t}. These eigen- 
values are the limiting values of the discrete configuration values given in 
(3.90) in the limit TV — > oo, u 0, with 4 Nu = 9 held fixed. 

The mean number of alleles in a sample of n genes can be found, following 
the same argument as that leading to (9.29), by evaluation of 

l 

/{I - (1 - x) n }<j> t (x) dx. (9.43) 

o 

An explicit expression for this mean is given by Griffiths (1979b, (2.10)), 
who also provides numerical calculations for various r, #, £, and pj values. 
We reproduce some representative calculations in Table 9.1 for two cases, 
first where there exists initially a single allele in the population and second 
where there exist initially many alleles of equal frequency. We observe that 
in the former case, the approach to the equilibrium point appears rather 
more rapid than in the latter. 

Griffiths also found properties of two samples, one in each of two subpop- 
ulations, which split apart some time in the past. In particular, he found 
a formula for the mean number of alleles common to the two samples at 
time t after the split and the joint probability distributions of the sample 
frequencies of these alleles. 

In Chapter 11 we shall consider various tests of the hypothesis of selective 
neutrality. These tests often reduce to a comparison of properties of the 
number of alleles, or of segregating sites, in a sample to some measure of 
population homozygosity (or, equivalently, heterozygosity). Unfortunately, 
the properties of the two measures under selection are often similar to 
their properties in a selectively neutral case in which the population has 
recently expanded in size after going through a bottleneck, or at the end 
of a selectively induced replacement process at a locus closely linked to the 
neutral locus (a “selective sweep”, discussed in Section 6.7). Thus these 
tests of selection can be rendered invalid at times closely following such 
historical events. 

Table 9.1 can be used to find various properties of the number of alleles in 
a sample following a bottleneck or a selective sweep, since we might assume, 
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to a close approximation, that only one allele survives a tight bottleneck 
or a selective sweep. It shows, for example, that when 0 = 1, the mean 
number of alleles in a sample of 200 genes is 4.89 when N generations have 
passed after the bottleneck or selective sweep, which is about 83% of its 
stationary mean value 5.88. 

The properties of the sample homozygosity should be close to those of 
the population homozygosity. We take 0 to be the time of the bottleneck 
and the population homozygosity at this time to be 1. With the mean 
homozygosity at time t diffusion time units denoted by (3.89) shows 
that 



pit) = 



1 

TT0 



+ 



e 

T+o 



exp-^ 



(9.44) 



Thus depends only on the leading eigenvalue in the set (9.42) whereas 
the mean number of alleles depends on all the eigenvalues. When 0 — 1 
the value of F^ arising N generations after the bottleneck is 0.684, so 
that the mean heterozygosity at this time is 0.361. This is about 63% of 
its stationary value. The comparison of this with the corresponding value 
for the mean number of alleles in the sample is then relevant to the effect 
of a bottleneck on a test for selective neutrality conducted N generations 
after the bottleneck or the selective sweep. 



9.5.3 The Moran Model 

Many exact properties of a sample of genes can be found immediately from 
the sample configuration given in (3.83), since under the Moran infinitely 
many alleles model, (3.83) holds exactly if 0 is defined by (3.98). This is 
in contrast to the situation for the Wright-Fisher model, where (3.83) is 
only an approximation. Thus with the Moran model definition of 0, (3.84), 
(3.85), (3.86), (3.87), and (3.88) are all exact, as is also the conditional 
distribution formula (9.30) that derives from them. It is interesting to ask 
why these formulas hold exactly in the Moran model, not only in a sample 
but also in the population, and also why sample formulas and population 
formulae are identical, with the replacement of n for 2N. In Chapter 10 we 
shall see why these two properties of the Moran model hold. 

Since for some simulation purposes it is necessary to derive a sample 
of genes that have the allelic partition formula (3.83), it is interesting to 
ask how such a sample may be generated efficiently. Perhaps the most 
interesting method to use is Hoppe’s urn (Hoppe, (1984), (1987), Watterson 
(1984)). We imagine an urn containing one black ball of mass 9 and a 
collection of balls of various colors, each of mass 1. Initially, the urn contains 
only the black ball. A ball is drawn at random from the urn with probability 
proportional to its mass. If it is the black ball, the black ball is replaced 
in the urn together with a new ball of a color not currently existing in the 
urn. If it is a colored ball, the ball drawn is replaced together with a new 
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ball of the same color as that drawn. The initial ball drawn must, of course, 
be the black ball. Thus if there are j — 1 colored balls in the urn at any one 
time, the probability that the next ball to enter the urn is of a new color, 
that is, the probability that the next ball drawn is the black ball is 



e 

j-1 + 0’ 



(9.45) 



independently of the color composition of the j — 1 colored balls. The process 
stops when there are n colored balls in the urn, and the “color” partition 
formula for these n balls is given exactly by (3.83). This “urn” procedure 
allows rapid simulation of random variables having the distribution (3.83). 

We can think of the Hoppe urn procedure as sampling “through space” , 
but we shall find in Chapter 10 that the procedure has an important 
interpretation as sampling “through time”. 

A concept closely linked to Hoppe’s urn is that of a partition structure 
(Kingman, (1978)). There should be no particular significance attached 
to the sample size n, and we can regard a sample of size n genes as one 
arising from a sample of size n + 1, one of which was accidently lost. We 
reasonably require a consistency of formulas for the two sample sizes. To 
formalize this we denote the left-hand side in the partition formula (3.83) 
by P n (ai, a 2 , . . .). The method of arriving at a sample of n genes as just 
described then implies that this must be equal to 



"—j-Pn+l ( a l a 2, • • • ) + *V~-| ~ Pn+l (&1 , • • * > a j - 1 — 1? + 1, • • •)• 

n -j- 1 “ n- hi 

(9.46) 

The right-hand side in (3.83) does satisfy this requirement, but Kingman 
raised the following much more general question: How may one characterize 
probability structures satisfying (9.46)? He called structures having this 
property “partition structures” , and showed that for all such structures of 
interest in genetics, P n (ai,a 2 , . . .) could be represented in the form 

Pn(ai,a 2 ,---) = J Pn(ai,a 2 ,...|x)/i(dx), (9.47) 

where P n (ai,a 2 , ... |x) is a complicated sum of multinomial probabilities 
whose exact form we do not write down. Kingman called fi the “representing 
measure” of P n (<M, ^ 2 , • • •) and found that for the partition formula (3.83), 
this representing measure is the Poisson-Dirichlet distribution, introduced 
in Section 5.10. 

The consistency requirement (9.46) is a natural one for a sample of 
genes. We shall, however, find a perhaps more important interpretation 
for this requirement when considering, in Chapter 10, the past history of 
the population from which the sample was taken. 

Kingman also took up the question of “noninterference” , defined by the 
requirement that if a gene is taken at random from the sample, and all r 
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genes of its allelic type are then removed from the sample, the partition 
probability structure of the remaining n — r genes should be the same as 
that of an original sample of n — r genes. Noninterference implies that 
P n (ai, . . . , a r , . . .) must satisfy the requirement 

TCI 

-^P n (ai, . . . , a r , . . .) = c(n, r)P n _ r (ai, . . . , a r _i, . . .), (9.48) 

where c(n, r) does not depend on ai, . . . . Kingman then showed that of 
all partition structures of interest in genetics, the only one also satisfying 
the requirement (9.48) is (3.83). 

These various results, including those relating to the Hoppe urn pro- 
cess, which might initially seem to be of purely mathematical interest, will 
appear to have a natural and important practical interpretation when we 
consider the coalescent process in Chapter 10. 



9.6 Sample Properties of Infinitely Many Sites 
Models 

9. 6. 1 Introduction 

We now turn to sampling properties relating to the infinitely many sites 
model. The properties of samples in the infinitely many sites model are 
relevant to the theory of single-nucleotide polymorphisms, for which much 
data are currently gathered and from which many inferences about the 
population under consideration are made. As stated in Section 9.2, we as- 
sume throughout, unless otherwise stated, that there is no recombination 
between sites, and that selective neutrality and stationarity both obtain. 
In Section 9.6.2, where the Wright-Fisher and the Cannings models are 
considered, all results are diffusion approximations and the diffusion ap- 
proximations definition of 6 = ANu (for the Wright-Fisher model) and 
6 = ANu/ a 2 (for the Cannings model) are used. The definition of 0 for the 
Moran model, for which exact results are obtained, is more complex and is 
considered in Section 9.6.3. 

9.6.2 The Wright-Fisher Model 

If two nucleotides at a given site segregate in a population with current 
frequency x, 1 — x, the probability that a given individual is heterozygous at 
this site is 2x(l — x). The mean number of heterozygous sites per individual 
is found by averaging this over the function 9x~ l found in (9.18), yielding 

l 

<,/*-{ 2*(1 -*»* = «• 

0 



(9.49) 
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The same calculation applies, of course, for any two genes taken at random 
in the population. 

It is interesting to observe that (9.19) shows that for the representative 
values N = 500,000, u — 10 -6 , for which 0 = 2, there will be on average 
about 26 sites segregating in the population, while (9.49) shows that of 
these, on average 2 sites segregate in any given diploid individual. 

Suppose now that a sample of 100 genes is taken. (We can think of the 
calculation in (9.49) as referring to a sample of two genes.) The argument 
leading to (9.19) shows that in the sample, the mean number of segregating 
sites is 0(0.6775 + log 100}. This allows an estimation of 0 from an observed 
number of segregating sites in the sample, and then from (9.19) we are able 
to estimate the number of sites segregating in the population, assuming that 
the population size is known. We now explore this observation further. 

Suppose that in a sample of size 100 we observe 10 segregating sites. 
The results in the previous section show that we could estimate 0 from the 
equation 



0(0.6775 + log 100} = 10, 

giving 0 =1.89. This estimate in conjunction with (9.19) leads to the esti- 
mate of 27.44 for the number of sites segregating in a population of 500,000. 
It also leads to an estimate of the probability that no segregation occurs 
in the population at a randomly chosen site. Perhaps more important, it 
also leads to an estimate of the probability of population polymorphism at 
a given site, where we adopt the Harris definition of polymorphism given 
in Section 9.2 to the site rather than the gene level. 

As an example of the calculations that are possible, we suppose that the 
gene consists of 2000 nucleotides. From the point of view of the individ- 
ual sites relevant to single nucleotide polymorphisms, the value 0 = 1.89 
should be replaced by the “site” value 1.89/2000 = 0.0000945. In this 
case, an estimate of the probability that a given site is monomorphic 
in the population is about (1,000,000) _ ° 0000945 = 0.987. The estimated 
probability of population polymorphism, following the Harris definition, is 
1 - (O.Ol)- 0 0000945 = .0044. 

These calculations lead us to a more detailed examination of the (ran- 
dom) number S n of segregating sites in the sample of n genes. For the case 
n = 2, Watterson (1975) showed that 

Prob(S 2 = s) = (jqrff) ]rjT 0 ’ J = 0 , 1 , 2 ,.... (9.50) 

For s = 0 this is 1/(1 +0). This agrees with the infinitely many alleles 
expression in (3.74), as it must, since if s = 0, the two genes sampled are 
of the same allelic type. From (9.50), 

E(S 2 ) = 0, vax(S 2 ) =0 + 0 2 . 



(9.51) 
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This value for the mean of S2 agrees with the expression given in (9.49), 
found by a different approach. The variance of S2 exceeds the mean because 
of correlation between sites caused by the complete linkage between sites. 

The probability distribution for S n for general values of n is more com- 
plicated. Watterson (1975) showed that S n has the distribution of the sum 
of n — 1 independent geometric random variables Yf, ¥2, . . . , Y n _ 1 , where 

Prob(r j =i) = (jT^) (j^)’ * = 0,1,2,..., (9.52) 

and made the perceptive comment that Yj is the number of mutations 
occurring in the ancestry of the sample during those times when the n 
genes in the sample have exactly j + 1 distinct ancestor genes. This remark, 
which we may take as heralding the development of coalescent theory, will 
be developed at length in Chapter 10. 

The probability that S n = 0 is the product of the probabilities that each 
Yj = 0, and this is 

("-!)! 

(0 + 1 + 2) • • • (6 + n — 1) 

This is identical, as it must be, to the probability, in the infinitely many 
alleles model, that the number of different alleles in the sample is 1 (see 
(3.87)). 

The representation (9.52) shows that the mean of iS n , being the mean of 
Y\ + Y2 + ■ • ■ + Y n _ 1 , is 



mean of S n = 0g \ , 


(9.53) 


where g\ is defined by 




n— 1 1 

91 = E 1 

U 1 


(9.54) 


Similarly, the variance of S n is (Watterson (1975)) 




var(5„) = gid + g 2 0 2 , 


(9.55) 


where g\ is defined in (9.54), and 




71—1 1 


(9.56) 



The complete distribution of S n was found by Tavare (1984), who showed 
that 



Prob(5 n = s) 




(9.57) 
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This probability can also be found from the recurrence relation (10.11) 
below, and the discussion in Chapter 10 shows why this recurrence relation 
holds. 

Summation in (9.57) shows that the probability that S n < s, which we 
denote by F(s,0), is given by 



F(s,6) 



1 - £(-i)> 

j = l 




(9.58) 



We now turn to questions of statistical inference, and consider first the 
estimation of 9 . Equation (9.53) implies that an unbiased estimator 9 s of 
6 is (Ewens, (1974b), implicit in Watterson (1975)), 

Os = — , ( 9 . 59 ) 

9 1 

where gi is defined in (9.54). If s n is the observed value of S n derived from 
a particular sample, the corresponding estimate of 9 is, immediately, 

0 S = — . ( 9 . 60 ) 

9\ 

Equation (9.55) implies that the variance of 9s is 

var(<9s) = — + . (9.61) 

9i 9 1 

Equation (9.49) shows that another possible estimator of 9 is provided 
by taking all possible pairs of genes and finding the average number of sites 
at which any two pairs differ. Equation (9.59) provides a different estimator 
of 9. Both estimators are based on the assumption of selective neutrality 
and can be expected to differ when selection exists. This observation forms 
the basis of one test of selective neutrality, discussed at greater length in 
Chapter 11. 

The expression 9x~ l found in (9.18) applies to a sample of genes as well 
as to the entire population, in that for those sites segregating in the sample, 
the mean number of sites, in a sample of n sequences, at which there are j 
representatives of the mutant nucleotide is 

J — 1> 2, . . . , n — 1. (9.62) 

This is the sample analogue of the population frequency spectrum (9.17), 
and could be called the sample frequency spectrum. Equivalently, given that 
a site is segregating, the probability that the mutant nucleotide appears j 
times (j = 1, 2, . . . , n — 1) is 



1 

39i 



(9.63) 
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Passing to a continuous approximation, the mean number of sites for 
which the frequency of the mutant nucleotide is in the range (x, x + Sx) is 

-5x, — < x < - — -. (9.64) 

x n n 



The expression (9.62) leads to what might be called the conditional mu- 
tant nucleotide frequency spectrum, given the number S n of segregating 
sites and the fact that S n /g\ is an unbiased estimator of 0, as 






39 1 



(9.65) 



We return to this result in Section 11.3.2. 

In many cases the mutant nucleotide might not be distinguishable from 
the original nucleotide, and in these cases a more relevant calculation is 
that the mean number of sites at which there are j representatives of one 
nucleotide and n — j of another is 



«" + *- *- = * 5 ^ 



(9.66) 



We could call (9.66) the sample frequency spectrum. The parallel 
conditional frequency spectrum is 



nS n 

j{n-j)gi 



(9.67) 



(An obvious modification to both these formulas is needed when n is even 
and j = nj 2.) 

The expressions in (9.36) show that 9 does not admit unbiased estima- 
tion using infinitely many alleles data. By contrast, it is clear from the 
above that both 9 and 9 2 admit unbiased estimation in the infinitely many 
sites case, using estimators based on S n and S 2 . This makes it all the more 
remarkable that, whereas the infinitely many alleles quantity K n is a suffi- 
cient statistic for 9 in that model, S n is not a sufficient statistic for 9 in the 
infinitely many sites model. This implies that in the infinitely many sites 
model, the data in a sample of genes beyond the information given by S n 
on its own can in principle be used to provide better estimation of 9 than 
that provided through S n only. We return to this point below. 

We make four remarks about the variance (9.61). First, with free recom- 
bination between sites and a Poisson mutation process, the variance of 9s 
is the first term on the right-hand side of (9.61). It is thus plausible that 
with small but nonzero recombination between sites, the variance of 9s is 
slightly less than the value given in (9.61), but on the other hand exceeds 
the first term on the right-hand side of (9.61). 

The second comment follows from the first. The variance (9.61) of 9s 
applies for completely linked sites. Despite this, the expression (9.61) is 
often used for data arising from a sample of many genes, often on several 
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different chromosomes. The bounds just described might then be useful for 
such samples in assessing the variance of Os- 

Third, the variance (9.61) is of order 1/ logn and is thus quite large even 
for large n. The same is true of the variance of the estimator (9.40). This 
implies that neither K n nor S n provides reliable estimation of 9. A variance 
of order 1/logra, rather than the classic statistical order 1/n, arises in both 
cases because of the dependence between the genes in the sample arising 
from their common ancestry, a matter taken up in more detail in Chapter 
iO. 

Finally, we can compare the variance (9.61) of 9s with the approximate 
infinitely many alleles mean square error (MSE) of Ok, given in (9.40). This 
comparison shows that the variance of Os is always less than the MSE of 
Ok • While for small 0 the two expressions are quite close, for 0 — 1 the 
variance of Os is about 94% of the MSE of Ok , and as 0 increases, the 
variance of Os becomes increasingly small relative to the MSE of Ok- This 
confirms the general principle that estimation of 0 using S n is better than 
that using K n . 

Despite the fact that Os provides more precise estimation of 0 than does 
Ok, it is in principle possible, as stated above, to employ more detailed 
“sites” data to find a better estimator of 0 than that provided by using 
only S'n, which ignores aspects of these more detailed data. This matter has 
been discussed at length in the literature. Optimal estimation in statistics 
arises through the method of maximum likelihood, and thus the aim is to 
find the likelihood of a sample of n genes, the data in this sample involving 
not only the value of S n but the complete configuration of the nucleotides 
at the various segregating sites. 

Unfortunately an explicit expression for this likelihood depends on his- 
torical factors concerning when the various mutational events occurred, to 
be discussed in detail in Chapter 10. This information is of course not di- 
rectly available from the data in the sample, although it might be possible 
to infer it from “out-group” data. The interpretation of Yj given above is 
that Yj is the number of mutations arising in the ancestry of the sample 
during those times when the n genes in the sample have exactly j + 1 dis- 
tinct ancestor genes. If the Yj were known, the likelihood of the data would 
be 



n— 1 

0 s "(n-l)! JJO' + 0) -(yj,+1) - (9.68) 

3 = 1 

This leads to an implicit equation for the maximum likelihood estimator of 
0 , namely, 



Sn 

0 



ki + e 



(9.69) 




314 9. Molecular Population Genetics: Introduction 



(Fu and Li (1993)). Standard statistical maximum likelihood theory then 
shows that the variance of any unbiased estimator of 9 cannot be less than 

0 

E n-l 1 5 

3 = 1 j+0 

and that the maximum likelihood estimator of 9 using Yi, • • • , Y n _ j has 
a variance achieving this bound. The variance (9.61) of 9s exceeds this 
bound, and when 9 is large it can significantly exceed this bound. 

These calculations suggest that approaches to estimation taking histor- 
ical factors into account should be useful. This implies that the use of 
computationally intensive methods, using the coalescent of the sample of 
genes (see Chapter 10), are needed. This theory was developed by Grif- 
fiths and Tavare ( 1994a, b,c, 1995, 1997, 1999) and from the point of view 
of Markov chain Monte Carlo (MCMC) methods by Kuhner et al. (1995, 
1998). The approach of Griffiths and Tavare uses a version of importance 
sampling, and this observation led Stephens and Donnelly (2000) to im- 
proved estimation methods using a new importance sampling approach. 
The details of these procedures will be discussed in Volume II. 



9.6.3 The Moran Model 

As might be expected, there are many exact results available for the sam- 
ple properties in the infinitely many sites Moran model. These can often 
be found directly from the corresponding population formulas by simply 
replacing 2 N in the latter by n. For example, the mean of the number S n 
of segregating sites in the sample is given by (9.24) with 2 N replaced by 
n, and the comments following (9.24) continue to apply with this replace- 
ment. The probability of sample monomorphism is given by (9.25) with 2 N 
replaced by n. In these formulas the definition of 9 given in (9.27) is used, 
as it is throughout this section. Many exact sample results for the Moran 
model follow immediately from the population results given above. 

However, other exact properties of the Moran model are not so easily 
obtained, and we now discuss these, beginning with the simplest case n = 2. 
The case n = 2 is of particular interest, since it relates to the homozygosity 
probability, or more precisely, since the Moran model relates to haploids, to 
the probability that two randomly chosen genes have identical nucleotide 
sequences. In the Poisson mutation input case this is found immediately 
from the expression (9.25) by putting 2 N = 2 to get 1/(2 Ne 1 ' — 2 N + 1). 
This is very close to the diffusion approximation 1/(1 + 9). For the general 
mutation input distribution model (9.16), the homozygosity probability is 
q 0 /(2N-(2N-l)q 0 ). 

In the case of the general mutation input distribution (9.16), the complete 
distribution of S 2 is that of the sum of M random variables, each having the 
general mutation distribution, where M itself is a random variable having 
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the geometric distribution 

Prob(M = m) = m = 1,2,3,.... (9.70) 

We shall see in Chapter 10 why this representation exists. As a check 
on (9.70), when M = m the probability that S 2 = 0 is q™. Thus the 
unconditional probability that S 2 — 0 is 



Qo_ 

2N 



m= 1 



and this reduces immediately to the expression q$/(2N — (27V — l)g 0 ) given 
above. 

We now turn to the distribution of S n for arbitrary n. For the Poisson 
input mutation case, Watterson (1975) found a probability generating func- 
tion for this distribution, which implies that it can be represented as the 
distribution of the sum of n — 1 random variables Yi, Y 2 , . . . , Y n -i, parallel 
to those surrounding (9.52). In this case, the random variable Yj has the 
distribution of the sum of Mj independent random variables, each having 
the Poisson input mutation distribution with mean v. Here Mj is itself a 
random variable having a geometric probability distribution generalizing 
(9.70), namely, 

= m= 1,2,3,.... (9.71) 

This representation leads to a form for the distribution of S n that is more 
complicated than the Wright-Fisher diffusion approximation (9.57). It does 
however allow an exact calculation for the variance of S n for the Poisson 
mutation case, namely 

var(S„) = 9l e + g 2 e 2 -^-, (9.72) 

with 0 defined as 2 A/V, and with gi and g 2 defined respectively in (9.54) 
and (9.56). 

The geometric mutation input distribution (9.28) was introduced in Sec- 
tion 9.4.3. This input distribution has the further interesting property that, 
perhaps uniquely, it allows an explicit form for the distribution of S n . This 
distribution is again represented as that of the sum of random variables 
Yl, Y 2 , . . . , Y n _!, whose distribution is given exactly by (9.52), with 0 de- 
fined by (9.27). It follows from this that the distribution of S n is given 
exactly by (9.57). This implies that the variance of S n is given exactly 
by (9.55), a slightly different value from the Poisson input mutation value 
given in (9.72). This shows that the variance of S n depends on the nature 
of the input mutation process. However, the difference between the two 
variances is small, and it vanishes in the diffusion approximation limit. 
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9.7 Relation Between Infinitely Many Alleles and 
Infinitely Many Sites Models 

It was remarked several times above above that when there is no recom- 
bination within the gene of interest, the infinitely many sites model acts 
as an infinitely many alleles model, so that the two models share various 
formulas in common. However, when recombination does exist, new alleles 
can be created in the infinitely many sites model by recombination. This 
process has no analogue in the infinitely many alleles model, for which new 
allelic types are assumed to arise from “normal” mutational nucleotide 
changes. Properties of the formation of new alleles by recombination and 
by nucleotide mutation are different: For example, in a population contain- 
ing one allelic type only, new alleles cannot be formed by recombination, 
whereas they can be formed by nucleotide mutation. 

Since the sampling formulas (3. 83) -(3. 85) are used frequently in popu- 
lation genetics theory, it is important to ask how satisfactory they are for 
an infinitely many sites model with recombination. Unfortunately, in the 
infinitely many sites model, the distribution of the number of alleles, that 
is of the number of distinct nucleotide sequences, appears to be very dif- 
ficult to obtain. The same is true of the distribution of their frequencies. 
However, it is at least clear that the view that the generation of a new al- 
lele through intracistronic recombination can be regarded for all practical 
purposes as a new “normal” mutation, so that (3.83)-(3.85) still apply to a 
close approximation, with a new definition of 9 embracing the possibility of 
“mutation” through recombination, is not justified. The following analysis, 
due to Strobeck and Morgan (1978), shows this. 

Strobeck and Morgan considered two sites in a gene and supposed that 
mutation occurs at each site at rate v, all mutations being new. A more 
realistic model takes into account the fact that there are only three possible 
“new” mutant nucleotides, but for the small values of v appropriate to 
nucleotide mutation rates the two models are probably reasonably close. 
In any event, several of the formulas given below are easily amended to 
the more accurate model. We denote the recombination fraction between 
sites by i?, with (R <C 1), and the population size by AT, and assume that 
a multi-site neutral Wright-Fisher model is applicable. 

We now consider four such “two-site” genes, labeled for convenience 
(ai&i), (a2&2)> (<23^3)5 and (0464). Here a$ is the nucleotide at site 1 in 
gene i, and bi is the nucleotide at site 2 in gene i. We define the symbol 
“=” to denote identity of nucleotide type and define 

F a = Prob(a* = %•), Prob(5; = bj)=F B , (i ^ j)> (9.73) 

Fab = Prob(a* = ajM = bj), {i ^ j), (9.74) 

G = Prob(a* = a,, bi = 6 fc ), (i ^ j ± fc), (9.75) 

G* = Prob(a* = aj,bi = bi), (i ^ j ^ k < l). (9.76) 
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The formal mathematics of the evolution of the two-site system is now 
identical to that of the two-locus system considered in Sections 3.6 and 
3.9. In particular, from (3.74), the equilibrium values of Fa and Fb are 

F a = F b = ( l + ( 9 . 77 ) 

where 0 = 4 Nv. (In the more accurate model allowing four nucleotides 
only, the equilibrium value from (3.70) with K = 4 is (3 + 0)/(3 + 40).) 

A recurrence relation analogous to (3.73) can be found for Fab : This 
was first done, in the context of two- locus models, by Serant (1974). This 
recurrence relation takes into account the possibilities of no, one, or two 
recombination events between the sites. If terms in N~ 2 , i? 2 , and v 2 are 
ignored, the recurrence relation is 

F' ab = (1 - 4«)({1 - 2 i ?}{( 2 iV )- 1 + (1 - (2N)~ 1 )F ab } + RG ). ( 9 . 78 ) 

Similar recurrence relations hold for G and G*, and simultaneous equilib- 
rium solutions of all equations may easily be found. The solution depends 
on the relative order of magnitude assumptions made about R and v. When 



N > 1 and v = 0(N 


_1 ) it is found, for example, that 




(i) 


R<.v: 


Fab ~ (1 + 20 ) \ 


(9.79) 


(2) 


R&v: 


Fab ~ c* (see below), 


(9.80) 


(3) 


R^>v: 


Fab ~ (1 + 0) — 2 5 


(9.81) 



where 

20 3 + 0 2 £ + 110 2 + 60£ + 2£ 2 + 180 + 13£ + 9 
a ~ (1 + ip){4-ip 3 + 64 2 £ + 2^ 2 + 20^ 2 + 19^ + 2£ 2 + 27-0 + 27£ + 9} 

and £ = 2 NR. It is clear why the values (9.79) and (9.81) arise. In (9.79) 
the recombination fraction is so low that the system can effectively be con- 
sidered to be a one-site system with mutation rate 2u, while in (9.81) the 
recombination rate is high enough so that the sites act effectively indepen- 
dently. These conclusions are analogous to the two interpretations of the 
fixation probability (3.138) for large and small NR considered in Section 
9.1. Since v is a nucleotide mutation rate, of order 10~ 8 or 10 -9 , we may 
expect 0 to be quite small for all populations of size 10 6 or fewer, in which 
case the three equilibrium values of Fab are quite close. For larger values 
of however, this is not the case. Thus for *0 = 4, Fab decreases from 
0.1111 at R = 0 to 0.0641 at R — iv and from 0.0463 at R = 20v to 0.0400 
when R > v. The formulas (9.79)— (9.81) were checked by simulation by 
Strobeck and Morgan (1978). 

These simulations also allow a check to be made of the adequacy of 
(3.84) and (9.30) for the distribution of allele number and frequencies in 
the present model. Watterson (1974b) found that if (3.84) and (9.30) hold, 
the variance in heterozygosity will be given by 

Var(F) = (1 + 0) 2 (2 + 0)(3 + 0)’ 



(9.82) 
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R 


v = 0.00125 








0 


Av 


10r> 


20v 


Fab 


0.5000 


0.4758 


0.4629 


0.4552 


e 


1.0000 


1.1017 


1.1603 


1.1968 


var (F) (theoretical) 


0.0417 


0.0392 


0.0378 


0.0370 


var(F) (empirical) 


0.0410 


0.0381 


0.0437 


0.0391 




V = 0.005 








R 


0 


Av 


IQv 


20v 


Fab 


0.2000 


0.1477 


0.1301 


0.1215 


9 


4.0000 


6.0572 


6.6864 


7.2305 


var (F) (theoretical) 


0.0076 


0.0037 


0.0027 


0.0023 


var(F) (empirical) 


0.0088 


0.0050 


0.0047 


0.0051 



Table 9.2. Values of Fab calculated from (9.79)-(9.81), values of 9 thus calculated 
from (9.83), values of the variance of (F) calculated from (9.82), and empirical 
values of this variance (Strobeck and Morgan, 1978) for N — 100 and various 
values of R and v. 

as in (5.138). This formula can be used for comparison with empirical 
values of the variance of (F) once an adequate definition of 6 can be made. 
Strobeck and Morgan (1978) do this by defining 9 as the solution of the 
equation 

Fab = (1 + 0)~\ (9.83) 

suggested by (5.138), where Fab is given by (9.79)-(9.81). In Table 9.2 
we give values of Fab , 0 as computed from (9.83), the variance of (F) as 
computed from (9.82) and empirical values of this variance, found from 
simulations. The latter differ consistently from the values calculated from 
(9.82) for ANv > 1, so we conclude, at least for these parameter values, that 
(3.84) and (9.30) do not apply for the two-site model with recombination. 

It is difficult to find properties of the distribution of the number of alleles 
in this model theoretically. Strobeck and Morgan observe in their simula- 
tions that whereas for R = 0 the mean number of alleles somewhat exceeds 
the variance in the number of alleles, as may be deduced from (3.84), this 
no longer applies when R > 0, so that, for example, for R — 20v the vari- 
ance is slightly in excess of the mean for v = 0.00125 and more than twice 
the mean for v = 0.005. Thus (3.84) cannot hold for such values of i?, and 
the conditional distribution (9.30), upon which some of the tests of neutral- 
ity considered in Chapter 11 are based, is also suspect. These observations 
confirm those made from consideration of the homozygosity. 

It is clearly important to assess realistic values of the scaled parameters 
£ = 2NR and 0 = ANv, Since v is a nucleotide mutation rate, we may 
expect v « 10“ 8 or 10~ 9 . Typical values of R are less precise: Possibly 
values of order 10 -5 may be expected. These values certainly imply R > v, 
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and the values in Table 9.2 then suggest that (3.84) and (9.30) are in doubt 
if v is sufficiently large. Unfortunately, the simulation values possibly do 
not cover the (i?, v) combinations of most importance, and extrapolation 
from Table 9.2 is difficult. A conservative argument is to note that the 
effect of recombination is certainly less for v = 10~ 8 than for v = 10 -6 . 
But for v = 10~ 6 , R = 10 -5 , N = 125,000, the values in Table 9.2 suggest 
that (3.84) and (9.30) might apply to a reasonable approximation. 



9.8 Genetic Variation Within and Between 
Populations 

In Chapter 12 we shall examine aspects of the evolution of genetic material 
in different populations or even species. In this section we consider how 
genetic variation at the molecular level can be divided, at least approxi- 
mately, into “within” and “between” population components by an analysis 
of variance technique. Although the approach considered has points of sim- 
ilarity with that of Lewontin (1973), who uses entropy measures instead of 
sums of squares, it is based essentially on ANOVA concepts and the work 
of Wright (1943, 1951, 1965a) and Nei (1973). 

Suppose that a sample of n genes is taken from each of h populations 
and that at any chosen nucleotide site only two nucleotides are observed in 
the entire sample. Define yij by 

{ +1 if the j th gene in the ith population contains nucleotide 1, 

0 if the j th gene in the ith population contains nucleotide 2. 

(9.84) 

Then the classical analysis of variances sums of squares 

~ Vi) 2 = within group sum of squares, 

n^^(yi — y) 2 = between group sum of squares, (9.85) 

become, with the identification (9.84), 

n^2xi(l — Xi) 2 and — x) 2 (9.86) 

respectively, where Xi is the frequency of nucleotide 1 in the sample from 
population i, and x is the average frequency over all samples. If cr 2 is the 
within-group variance in frequency and a 2 the between-group variance, the 
sums of squares in (9.86) are unbiased estimators of 

/c(n-l)cr 2 and (k - l)cr 2 + n(k - l)cr 2 , (9.87) 

respectively, so that a ^ and a 2 can be estimated by 

al = n{k(n - l)}" 1 ^x*(l - 



(9.88) 
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and 



of — (fc-l) 1 XV* “ x ) 2 _ - *)} 1 X 2 ^ 1 ~ x i)- (9.89) 

The estimator (9.88) is necessarily nonnegative, whereas the right-hand 
side in (9.89) can be negative: If it is, we conventionally put a\ — 0. 

A measure of within- and between-group variation can now be found by 
averaging <r^ and b\ over a number of nucleotide sites: This is in effect the 
procedure of Lewontin (1973). However, the ability to allocate individuals 
to groups with high success on the basis of genetic characteristics is not 
incompatible with a high b\ja\ ratio, since such an allocation can take 
advantage of multivariate analysis of variance techniques, and does not 
rely on simple averaging of a\ and of values. 



9.9 Age-Ordered Alleles: Frequencies and Ages 

The current direction of interest in population genetics is a retrospective 
one, looking backward to the past rather than (as with much of the theory 
in this book) looking forward into the future. This change of direction 
is largely spurred by the large volume of genetic data now available at 
the molecular level and a wish to infer the forces that led to the data 
observed. Tests of the neutral theory, discussed in Chapter 11, form one 
such inferential procedure. 

Far more important, however, is the retrospective process associated with 
the coalescent, discussed at greater length in Chapter 10. The concept of 
the coalescent leads naturally into a discussion of the age properties of 
alleles as well as a discussion of age-ordered allele frequencies. This topic 
has recently been reviewed by Slatkin and Rannala (2000). The discussion 
in this section does not aim at a general overview such as that provided 
by Slatkin and Rannala. Instead it is more specific, being slanted toward 
explaining some of the formulas in previous sections of this chapter by using 
age properties of alleles, and then a an introduction to further explanations 
using coalescent theory. 

The material in this section covers both sample and population formulas 
relating to the infinitely many alleles model. Some results are diffusion 
approximations, and for them the definition of 9 depends on the population 
model implicitly discussed. Various formulas for the Moran model are exact. 
The concept of reversibility was introduced in Section 2.12. This concept 
can be used to derive age properties from the prospective theory, and vice 
versa. Reversibility arguments were used, for example, in deriving (5.134), 
and further examples of the form of argument leading to (5.134) will be 
given later. We shall freely use reversibility arguments throughout, relying 
on reversibility properties of the diffusion process and also of the Moran 
infinitely many alleles model. 
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We first discuss allelic frequencies, for which finding “age” properties 
amounts to finding size-biased properties. Kingman’s (1975) Poisson- 
Dirichlet distribution was introduced in Section 5.10. Unfortunately, this 
distribution is not user-friendly, as, for example, (5.130) and (5.131) imply. 
This makes it all the more interesting that a size-biased distribution closely 
related to it, namely the GEM distribution, named for Griffiths, (1980), 
Engen (1975) and McCloskey (1965), who established its salient properties, 
is both simple and elegant. More important, it has a central interpretation 
with respect to the ages of the alleles in a population. We now describe 
this distribution. 

The ordered allelic frequencies in the population follow the Poisson- 
Dirichlet distribution. Suppose that a gene is taken at random from the 
population. The probability that this gene will be of an allelic type whose 
frequency in the population is x is just x. In other words, alleles are sampled 
by this choice in a size-biased way. It can be shown from properties of the 
Poisson-Dirichlet distribution that the (random) frequency of the allele 
determined by this randomly chosen gene is 

f(x) = 9(l-x) e ~ 1 . (9.90) 

This result also follows from the frequency spectrum (3.95): The probability 
that there exists an allele in the population with frequency between x and 
x + Sx, and that the gene chosen is of this allelic type, is 9x~ 1 (l—x) e ~ 1 xSx 
= 9(1 — x) e ~ 1 Sx. Equation (9.90) follows immediately. 

Suppose now that all genes of the allelic type just chosen are removed 
from the population. A second gene is now drawn at random from the pop- 
ulation and its allelic type observed. The frequency of the allelic type of 
this gene among the genes remaining at this stage is also given by (9.90). 
All genes of this second allelic type are now also removed from the popu- 
lation. A third gene is then drawn at random from the genes remaining, 
its allelic type observed, and all genes of this (third) allelic type removed 
from the population. This process is continued indefinitely. At any stage, 
the distribution of the frequency of the allelic type of any gene just drawn 
among the genes left when the draw takes place is given by (9.90). This 
leads to the following representation. Denote by Wj the original population 
frequency of the j th allelic type drawn. Then we can write 

wi = (1 — #i)(l — X 2 ) • • • (1 — Xj-i)xj, j = 2,3,..., (9.91) 

where the Xj are independent random variables, each having the dis- 
tribution (9.90). The random vector (uq, ic 2 , . . .) then has the GEM 
distribution. 

All the alleles in the population at any time eventually leave the popu- 
lation, through the joint processes of mutation and random drift, and any 
allele with current population frequency x survives the longest with prob- 
ability x. That is, since the GEM distribution was found according to a 
size-biased process, it also arises when alleles are labeled according to the 
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length of their future persistence in the population. Reversibility arguments 
then show that the GEM distribution also applies when the alleles in the 
population are labeled by their age. In other words, the vector (wi,W 2 , • • •) 
can be thought of as the vector of allelic frequencies when alleles are or- 
dered with respect to their ages in the population (with allele 1 being the 
oldest). 

The elegance of many age-ordered formulas derives directly from the sim- 
plicity and tract ability of the GEM distribution. We now give two examples. 
First, the GEM distribution shows immediately that the mean population 
frequency of the oldest allele in the population is 



1 

e J x{\ -x) 0 ~ l dx = (9-92) 

and more generally that the mean population frequency of the jth oldest 
allele in the population is 



1 + 0V1 + 6) 



Second, the probability that a gene drawn at random from the population 
is of the type of the oldest allele is the mean frequency of the oldest allele, 
namely l/(l+0), as just shown. More generally, the probability that n genes 
drawn at random from the population are all of the type of the oldest allele 
is 



0 




— x) e 1 dx = 



n\ 

(1 + 0)(2 + 0 ) • • • {n + 6 ) 



The probability that n genes drawn at random from the population are 
all of the same unspecified allelic type is 



of x n ~ 1 {l-x) 0 - l dx = 

Jo 



(rc-1)! 

(1 + 0)(2 + 0) • • • (n + 0 — 1) 



in agreement with (3.87). From this, given that n genes drawn at random 
are all of the same allelic type, the probability that they are all of the allelic 
type of the oldest allele is n/(n + 6). The similarity of this expression with 
that deriving from a Bayesian calculation is of some interest. 

The GEM distribution is, of course, a diffusion approximation, and the 
above results are diffusion approximations. The distribution has a number 
of interesting mathematical properties. It is invariant under size-biased 
sampling, and this property has been used by Hoppe (1987) to derive the 
frequency spectrum (3.95). It also has important properties with respect to 
the concepts of random deletions and noninterference, discussed in Section 
9.5.3, which were also exploited by Hoppe (1986). These properties are 
perhaps of more interest in ecology than in genetics, so we do not develop 
them here. 
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It will be expected that various exact results hold for the Moran model, 
with 9 defined as 2Nu/(l - u). The first of these is an exact representation 
of the GEM distribution, analogous to (9.91). This has been provided by 
Hoppe (1987). Denote by Afi, A^, . . . the numbers of genes of the oldest, 
second-oldest, . . . alleles in the population. Then Ah, AT 2 , . . . can be defined 
in turn by 



Ni — 1 + Ah, i — 1,2,..., (9.93) 

where Mi has a binomial distribution with index 2N — N\ — Ah — • • ■ — 
N{-i — 1 and parameter xi, where # 1 , 2 : 2 ,... are iid continuous random 
variables each having the density function (9.90). Eventually the sum N\ + 
Ah + • • • + Nk reaches the value 2N and the process then stops, the final 
index k being identical to the number Ah at of alleles in the population. 

It follows directly from this representation that the mean of N\ is 

1 + (2N — 1)9 [ x(l — x) e ~ 1 dx = ® 

Jo 1 + 0 

The mean of the proportion N\/(2N) is l/{ 1 +■ (2N — l)u}, which is very 
close to the diffusion approximation l/{ 1 + 9}. 

If there is only one allele in the population, so that the population is 
monomorphic, this allele must be the oldest one in the population. The 
above representation shows that the probability that the oldest allele arises 
2N times in the population is 

Prob(M! =2N-l) =9 [ x 2N ~ l {l - x) e ~ l dx, 

Jo 

and this reduces to the monomorphism probability (3.99). 

More generally, Kelly (1977) has shown that the complete distribution 
of the number of genes of the oldest allele is, for the Moran model, 



9 f2N 

Prob (oldest allele represented by j genes) = f 



The case j = 2N considered above is a particular example of (9.94), and 
the mean number ( 2N + 0)/(l + 9) follows from (9.94). 

We now turn again to approximations deriving from diffusion methods. 
A question of some interest is to find the probability that the oldest allele 
in the population is also the most frequent. By reversibility arguments this 
is also the probability that the most frequent allele in the population will 
survive the longest into the future, and in turn this is the mean of the 
frequency of the most frequent allele. Unfortunately, the distribution of 
the frequency of the most frequent allele is the user-unfriendly Poisson- 
Dirichlet distribution, and no exact results are available. It is easy to see 
from the form of the Poisson-Dirichlet distribution that a lower bound for 
the mean frequency of the most frequent allele is ( 1/2)^ , which is useful 
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for small 9 but not of much value for larger 9. Numerical calculations are 
given by Watterson and Guess (1977) for a range of 9 values, who provide 
also the upper bound l — 9(1 — 9) log 2. For example, when 9=1 this mean 
is 0.624, which may be compared with the mean frequency of the oldest 
allele (which must be less than the mean frequency of the most frequent 
allele) of 0.5. 

We now turn to “age” questions. Some for these follow immediately from 
our previous calculations. For example, the mean time for all alleles existing 
in the population at any time to leave the population is given in (9.5), and 
by reversibility this is the mean time, into the past, that the oldest of 
these originally arose by mutation. This is then the mean age of the oldest 
allele in the population, given on a “generations” basis. Since we refer to 
this calculation with reference to the mean age of the oldest allele in the 
population, we repeat it here, with this new interpretation: 



mean age of oldest allele 



2 N 



E 



AN 

j{j + 0- 1) 



generations. 



(9.95) 



In the case 9 = 2, this mean age is very close to AN - 2, that is, to the 
conditional mean fixation time (5.36). The exact result corresponding to 
(9.95) for the Moran model is given in (9.10), or equivalently in (9.11), being 
almost exactly 4 N 2 birth and death events when 9 = 2Nu/(l — u) = 2. 
This is close to the conditional mean fixation time given in (3.54), and the 
reason for these identities is discussed below (9.5). 

In employing the argument leading to (9.95) we in effect use a result of 
Watterson and Guess (1977) and Kelly (1977), stating that not only the 
mean age of the oldest allele, but indeed the entire probability distribu- 
tion of its age, is independent of its current frequency and indeed of the 
frequency of all alleles in the population. 

We next ask, If an allele is observed in the population with frequency 
p, what is its mean age? By reversibility, this is the mean time i(p) that 
it persists in the population, and in the diffusion approximation to the 
Wright-Fisher model this is found immediately from (3.20) as 



OC . Tl T 

This is clearly a generalization of the expression in (9.95), since if p = 1, 
only one allele arises in the population, and it must then be the oldest 
allele. A parallel exact calculation for the Moran model follows from the 
mean persistence time found eventually using (2.160) and (3.57). 

A question whose answer follows from the above calculation is the follow- 
ing: If a gene is taken at random from the population, what is the diffusion 
approximation for the mean age of its allelic type With a change of no- 
tation, the density function of the frequency p of the allelic type of the 
randomly chosen gene is, from (9.90), f(p) = 9(1 — p) 0_1 . The mean age 
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t(p) of an allele with frequency p is, by reversibility, given by (3.20). The 
required probability is 

6 f - p) e ~' dp, (9.97) 

Jo 

and use of (3.20) for t(p) shows that this reduces to 2/9 diffusion time units, 
or for the Wright-Fisher model, 1 ju generations. This conclusion may also 
be derived by looking backward to the past and using the coalescent argu- 
ments given in Chapter 10. However, we shall not derive it this way, since 
it is an immediate result. Looking backward to the past, we see that the 
probability that the original mutation creating the allelic type of the gene 
in question occurred j generations in the past is u(l — w) J_1 , j — 1 , 2 ,..., 
and the mean of this (geometric) distribution is 1/u. 

An exact calculation parallel to this is possible for the Moran model, 
using the exact frequency spectrum (3.102) and the exact mean age deriving 
from (3.57). However, a direct argument parallel to that just given for the 
Wright-Fisher model shows that the exact mean time, measured in birth 
and death events, is 2 N/u. 

We turn now to sample properties, which are in practice more important 
than population properties. The most important sample distribution con- 
cerns the frequencies of the alleles in the sample when ordered by age. This 
distribution was obtained d by Donnelly and Tavare (1986), who found the 
probability that the number K n of alleles in the sample takes the value &, 
and that the age-ordered numbers of these alleles in the sample are (in age 
order) n( 2 y . . . , This probability is 



6 k {n- 1)! 

Sn(6)ri(ty (ri(fc) 4- ft'(fc-i)) * * * (ft(fc) + ft'(fc-i) + * • * + ^( 2 )) 



(9.98) 



where Sj(9) is defined below (3.83). This formula can be found in several 
ways, one being as the size-biased version of (3.88). 

The expression (9.98) is exact for the Moran model with 9 defined as 
2Nu/(l — u). 

Several results concerning the oldest allele in the sample can be found 
from this formula, or in some cases more directly by other methods. For 
example, the probability that the oldest allele in the sample is represented 
by j genes in the sample is (Kelly, (1976)) 



6 /'n\ /n 6 — 1 
™U7\ 3 



-1 



(9.99) 



This is identical to the expression (9.94) if 2 N is replaced by n in the latter. 

Further results provide connections between the oldest allele in the sam- 
ple and the oldest allele in the population. Some of these results are exact 
for a Moran model, and others are the corresponding diffusion approxi- 
mations. For example, Kelly (1976) showed that in the Moran model, the 
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probability that the oldest allele in the population is observed at all in the 
sample is n(2N + 6)/[2N(n + 9)\. This is equal to 1, as it must be, when 
n — 2 iV, and when n— 1 it reduces to a result found above that a randomly 
selected gene is of the oldest allelic type in the population. The diffusion 
approximation to this probability, found by letting N — > oo, is n/(n + 9). 

A further result is that in the Moran model, the probability that a gene 
seen j times in the sample is of the oldest allelic type in the population is 
j(2N + 9)/[2N(n + 9)\. Letting N — > oo, the diffusion approximation for 
this probability is j/(n + 6). When n — j this is j/(j + 0), a result found 
above found by other methods. 

Donnelly (1986)) provides further formulas extending these. He showed, 
for example, that the probability that the oldest allele in the population is 
observed j times in the sample is 



6 fn 
n + 9 VI 



n + 9 — 1^ 



3 



0, 1, 2, . . . , n. 



(9.100) 



This is, of course, closely connected to the Kelly result (9.99). For the case 
j — 0 this probability is 0/(n + 0), confirming the complementary probabil- 
ity n/(n + 9) found above. Conditional on the event that the oldest allele 
in the population does appear in the sample, a straightforward calculation 
using (9.100) shows that this conditional probability and that in (9.99) are 
identical. 

Griffiths and Tavare (1998) give the Laplace transform of the distribution 
of the age of an allele observed b times in a sample of n genes, together with 
a limiting Laplace transform for the case in which 9 approaches 0. These 
results show, for the Wright-Fisher model, that the diffusion approximation 
for the mean age of such an allele is 



oo 



£ 



4AT 

j{j -1 + 0) 



(n-fr + %) 

(n + 6) (j) 



(9.101) 



generations, where is defined as a ^ = a(a-fl) • * • (a-f-j — 1). This is the 
sample analogue of the population expression in (9.96), and it converges to 
(9.96) as n — » oo with b = np. 

In the particular case 9 — 2, which we have considered several times 
above, the expression in (9.101) simplifies to 



ANb 
n — b 



n 



£ r 1 . 

j=b + 1 



(9.102) 



Under the limiting process n —> oo with b — np this approaches the expres- 
sion in (3.22). This is as expected, since when 9 — 2, (3.22) is by reversibility 
arguments also the mean age of an allele observed with frequency p in the 
population. 

Our final calculation concerns the mean age of the oldest allele in the 
sample. For the Wright-Fisher model the diffusion approximation for this 
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mean age is 



n 




1 

j{j + 0-l) 



(9.103) 



For the case n — 2 N this is the value given in (9.5), and for the case n = 1 
it reduces to the value 1 ju given above. The corresponding exact result for 
the Moran model is 

2N(2N + e)f^ j{j+ ' e _ 1} ( 9 . 104 ) 



birth and death events, with (of course) 9 defined as 2Nu/(l — u). When 
n = 1 this reduces to the calculation 2 N/u given above. When n = 2N it 
is identical to (9.11) and, less obviously, to the expression given in (9.10). 
The expression in (9.103) may be written equivalently as 



n 



E 



1 

Vj + Wj ’ 



(9.105) 



where 



ju 

2N' 



_ j(j ~ 1)(1 ~u) 
Wj (2 IV) 2 



(9.106) 



These expressions follow the pattern of (9.12) and (9.13). In Chapter 10 
we shall explain why the mean age of the oldest allele in a sample can be 
expressed in the form defined by (9.105) and (9.106) and why the mean 
age of the oldest allele in the population can similarly be expressed in the 
form defined by (9.12) and (9.13). These are found by an analysis of the 
coalescent process, which so far has been kept in the background. It is 
therefore now time to turn to it. 




10 

Looking Backward in Time: The 
Coalescent 



10.1 Introduction 

It is remarkable that the elegant Watterson formulation for the probability 
distribution for S n , given implicitly by (9.52), together with the percep- 
tive remark following it, as well as the elegance and simplicity of many of 
the “age” formulas in Section 9.9, were not immediately seized upon and 
investigated at greater length immediately after they appeared to deter- 
mine why formulas of these elegant forms arise. Since these formulas relate 
to the past history of the population, historical factors must explain them. 
Similarly, the unequal frequencies that tend to arise even among selectively 
equivalent alleles, as shown, for example, by (3.83), must be explained by 
historical factors: The oldest allele in a sample will tend to have a higher 
frequency than a newly arisen mutant allele. It fell to Kingman (1982a,b,c) 
to recognize the importance of these historical factors, to see that they are 
most simply approached by a retrospective analysis of the ancestry of the 
genes in a sample, to introduce the concept of the coalescent, which pro- 
vides the framework for this retrospective analysis, and then to lay down 
the basic mathematical machinery of the coalescent process. 

The idea of the coalescent was, however, “in the air” at the time: See, 
for example, Tajima (1983) and Hudson (1983). Nor should one fail to 
acknowledge the pioneering work of Malecot (1948), which introduced and 
exploited the concept of “looking backward in time” to derive important 
results in population genetics theory. 
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In this chapter we give a brief introduction to the main ideas of the co- 
alesced. We focus on the simple case in which there are no complications 
due to selection, recombination, geographical structure, fluctuating popu- 
lation sizes, and so on. The coalesced also leads to significant advances in 
statistical inference procedures in population genetics. Again, these are not 
considered in detail here. Definitive reviews of the extensions to the the- 
ory needed to handle the complications discussed above, and of inference 
questions in the coalesced, are provided respectively by Nordborg (2001), 
Griffiths and Tavare (2003), and Tavare (2004). Our aim in this chapter 
is to give an overview of the more elementary properties of the coalesced 
process, with a focus on demonstrating how several of the formulas arrived 
at in Section 9.9 are more naturally arrived at by coalesced methods. A 
far more complete discussion of the coalesced will be given in Volume II. 



10.2 Competing Poisson and Geometric Processes 



It is convenient to start with two technical results, one of which will be 
relevant for diffusion approximations in the coalesced, while the other will 
be relevant for exact Moran model calculations. 

We consider first a Poisson process in which events occur independently 
and randomly in time, with the probability of an event in (t, t + St) being 
aSt. (Here and throughout we ignore terms of order (St) 2 .) We call a the 
rate of the process. Standard Poisson process theory shows that the density 
function of the time between events, and until the first event, is f{x) = 
a e -ax , and thus that the mean time until the first event, and also between 
events, is 1/a. 

Consider now two such processes, process (a) and process (b), with re- 
spective rates a and b. Various results follow almost immediately from 
standard Poisson process theory. Given that an event occurs, the probabil- 
ity that it arises in process (a) is a /(a + b). The mean number of “process 
(a)” events to occur before the first “process (b)” event occurs is a/b. More 
generally, the probability that j “process (a)” events occur before the first 
“process (b)” event occurs is 



b / a \j 
a + b\a + b) 






( 10 . 1 ) 



The mean time for the first event to occur under one or the other process is 
l/(a + b). Given that this first event occurs in process (a), the conditional 
mean time until this first event occurs is equal to the unconditional mean 
time l/(a + b ). The same conclusion applies if the first event occurs in 
process (b). 

We now turn to the geometric distribution. We consider a sequence of 
independent trials and two events, event A and event B. The probability 
that one of the events A and B occurs at any trial is a -f b. The events A 
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and B cannot both occur at the same trial, and given that one of these 
events occurs at trial i, the probability that it is an A event is a /(a -f b). 

We are interested in the random number of trials until the first event 
occurs. This number is a geometric random variable taking the value i, i — 
1,2,..., with probability (1 — a — b)' l ~ 1 (a + b ). The mean of this number 
is l /(a + b). The probability that the first event to occur is an A event 
is a/(a + b). Given that the first event to occur is an A event, the mean 
number of trials before the event occurs is l /(a + b). In other words, this 
mean number of trials applies whichever event occurs first. The similarity 
of properties between the Poisson process and the geometric distribution 
is evident. 



10.3 The Coalescent Process 

We start by describing the coalescent as a quite abstract process, not as- 
sociated with any of the specific concrete evolutionary models discussed in 
previous chapters, and later we will see how this process can be used to find 
properties of the past history of a population whose evolution is described 
by these models. 

We consider the ancestry of a sample of n genes taken at the present time. 
Since our interest is in the ancestry, we consider a process moving backward 
in time, and introduce a notation acknowledging this. We consistently use 
the notation r for a time in the past before the sample was taken, so that 
if t 2 > r \ , then r 2 is further back in the past than is t\ . 

We describe the common ancestry of the sample of n genes at any time r 
through the concept of an equivalence class. Two genes in the sample of n 
are in the same equivalence class at time r if they have a common ancestor 
at this time. Equivalence classes are denoted by parentheses: Thus if n = 8 
and at time r genes 1 and 2 have one common ancestor, genes 4 and 5 a 
second, and genes 6 and 7 a third, and none of the three common ancestors 
are identical, the equivalence classes at time time r are 

(1,2), (3), (4,5), (6,7), (8). (10.2) 

Such a time r is shown in Figure 10.1. 

We call any such set of equivalence classes an equivalence relation, 
and denote any such equivalence relation by a Greek letter. As two 
particular cases, at time r = 0 the equivalence relation is </q = 

{(1), (2), (3), (4), (5), (6), (7), (8)}, and at the time of the most recent 
common ancestor of all eight genes, the equivalence relation is (j) n = 
{(1, 2, 3, 4, 5, 6, 7, 8)}. The Kingman coalescent process is a description of 
the details of the ancestry of the n genes moving from (j>i to </> n . 

We now turn to a more detailed description of this process. Let £ be 
some equivalence relation, and rf some equivalence relations that can be 
found from £ by amalgamating two of the equivalence classes in £. Such 
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an amalgamation is called a coalescence, and the process of successive such 
amalgamations is called the coalescence process. It is assumed that, if terms 
of order ( 5r ) 2 are ignored, 



Prob (process in 77 at time r + Sr | process in £ at time r) = 5r, (10.3) 

and if j is the number of equivalence classes in £, 



Prob (process in £ at timer + ^r | process in £ at time r) = 1 - 



j{j - 1 ) 



St. 



(10 - 4) 

This might seem to be a heavy-handed description of the way in which the 
ancestry of a sample of genes traces back to, and coalesces at, a common 
ancestor. Indeed, may coalescent results can be found without the full de- 
scription of the process just given. However, as we see below, the derivation 
of the sampling formula (3.83) requires this full description. 



10.4 The Coalescent and Its Relation to 
Evolutionary Genetic Models 

The main purpose of the coalescent is to provide results, either exact or 
approximate, for the evolutionary models considered so far in this book 
and to give a coherent framework within which to view these results. 
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Kingman ( 1982a, b,c) showed in his path-breaking work introducing the 
coalescent that, provided several straightforward conditions are fulfilled, 
the coalescent provides excellent approximations to quantities of evolution- 
ary interest, the approximations improving as the population size increases. 
We discuss several of these in Section 10.5. 

Kingman focused on the Cannings infinitely many alleles model outlined 
in Section 3.6.3, of which the Wright-Fisher model is a particular case. He 
considered a sequence of such models, one for each population size N. It 
is thus convenient to denote the (random) number of offspring genes from 
any one parental gene by in a population of N individuals (2N genes), 
and to denote the variance of vjst by a 2 N . Kingman then showed, under the 
requirements that converge to a positive finite limit a 2 as N — >• oc and 
that the supremum of all moments of i/jv remain finite under the same limit, 
that the ancestral properties of a sample of fixed size n in the Cannings 
model converge, as N — >> oc, to those of the coalescent. 

The Wright-Fisher model is a particular case of the Cannings model, 
and for it cf 2 n = 1 — (2iV) _1 , so that the first requirement holds, and it is 
equally easy to check that the second requirement holds. There are some 
extreme Cannings models for which one or other of the requirements listed 
above does not hold, but these seldom arise in practice. 

Of course, any coalescent result will always be an approximation for 
the corresponding Wright-Fisher, or more generally Cannings, result. One 
reason for this is that the coalescent is a continuous-time process, while the 
Wright-Fisher and Cannings models are discrete-time processes. In this, the 
coalescent process is similar to a diffusion process, which also takes place 
in continuous time. The similarity goes further: In effect, coalescent results 
are diffusion approximation results. As with time calculations derived from 
diffusion processes, time calculations derived from the coalescent process 
must be multiplied by a scaling factor to be brought to a “generations” 
basis. For the Wright-Fisher model this scaling factor is 2N. 

A more important reason why coalescent results apply immediately only 
to samples of genes is that in the Wright-Fisher and Cannings models, 
several coalescent events can occur simultaneously, whereas this does not 
happen in the coalescent process. For a fixed sample size this becomes 
less and less likely in the ancestry of a sample as N — > oc. In the entire 
population, however, simultaneous coalescences can be expected, so that 
coalescent results may not be taken over without further consideration to 
describe population properties. Despite this, we will find that some formal 
coalescent calculations are surprisingly accurate for population quantities. 
The reason for this will be given in the next section. 

We will later describe a discrete-time coalescent process for the Moran 
model, which will allow exact calculations to be made, thus explaining the 
many exact “time” and “age” results for this model found in Chapter 9, 
holding both for a sample of genes and also for the entire set of genes in 
the population. 
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10.5 Coalescent Calculations: Wright-Fisher 
Models 

In this section we consider various calculations arising from the coalescent 
process, and use them as approximations for results for the Wright-Fisher 
evolutionary model. Many of these will result in values agreeing with diffu- 
sion approximations found in Chapter 9, and the coalescent process often 
provides the simplest way of arriving at these results. This agreement con- 
firms the claim that coalescent results are in effect diffusion approximation 
results for this model. We use the coalescent time scale in the calculations 
and then convert the results found to a Wright-Fisher time scale at the 
end of the analysis. 

We consider first the coalescent process on its own. This process in effect 
consists of a sequence of n — 1 Poisson processes, with respective rates 
j(j — l)/2, j = n, n — 1, . . . , 2, describing the Poisson process rate at which 
two of these classes amalgamate when there are j equivalence classes in the 
coalescent. Thus the rate j(j — l)/2 applies when there are j ancestors of 
the genes in the sample for j < n, and the rate n(n — l)/2 applies for the 
sample itself. 

The Poisson process theory outlined in Section 10.2 shows that the time 
Tj to move from an ancestry consisting of j genes to one consisting of j - 1 
genes has an exponential distribution with mean 2 /{j(j — 1)}. Since the 
total time required to go back from the contemporary sample of genes to 
their most recent common ancestor is the sum of the times required to go 
from j to j — 1 ancestor genes, j = 2,3, ... ,n, the mean £(Tmrcas) is, 
immediately, 

E(WA S ) = 2g-^ = 2 |;-^. (10.5) 

This time is essentially 2 coalescence time units, and it requires a mul- 
tiplicative scaling factor of 2N to convert to a “generations” basis when 
applied to the Wright-Fisher model. 

It is clear from (10.5) that about half this mean time relates to the final 
coalescence of two lines of ascent into one. This observation gives some idea 
of the shape of the coalescent tree: The long arms tend to arise when there 
is a very small number of genes in the ancestry of the sample. 

The times Tj,j = 1, 2, . . . , n— 1, are independent, so that the variance of 
Tmrcas is the sum of the variances of the Tj . Standard calculations show 
that this is approximately 47r 2 /3 - 12, or about 1.16, (squared) time units. 

The complete distribution of Xmrcas is also known (Tavare (2004)). 
However the expression is complicated and we do not reproduce it here, 
other than to note the simple inequalities 

e~ l < Prob (Tmrcas > t) < e~ 3t . 
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If the above theory were to apply to the entire population of genes in a 
Wright-Fisher model, the mean £(Imrcap) of the total time to arrive at 
the most recent ancestor gene of all the genes in the population (MRCAP) 
would be 



£(T M rcap) = 4JV-2 (10.6) 

generations. Although coalescent theory does not apply directly to the en- 
tire population, the mean number of generations given in (10.6) is correct. 
The reason for this is implicit in an observation made above, that the long 
arms in any coalescent process tend to arise when the number of genes in 
the ancestry of the genes considered is small, and for such small numbers 
the assumptions for the coalescent process hold. 

The conclusion (10.6) can also be reached by reversibility arguments. We 
may regard the MRCAP gene as one that is certain to fix in the current 
generation. Given that a certain allele appears with only one representing 
gene, the mean number of generations until it eventually fixes the popula- 
tion, given that eventual fixation does occur, is AN — 2 generations, as is 
shown by (3.12). This is identical to the expression in (10.6). 




Figure 10.2. The coalescent with mutations 

We now introduce mutation, and suppose that the probability that any 
gene mutates in the time interval (r + 5r, r) is (0/2)8r. All mutants are 
assumed to be of new allelic types. Following the coalescent paradigm, we 
trace back the ancestry of a sample of n genes to the mutation forming the 
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oldest allele in the sample. As we go backward in time along the coalescent, 
we shall encounter from time to time a “defining event” , taken either as a 
coalescence of two lines of ascent into a common ancestor or a mutation in 
one or other of the lines of ascent. Figure 10.2 describes such an ancestry, 
identical to that of Figure 10.1 but with crosses to indicate mutations. 

We exclude from further tracing back any line in which a mutation oc- 
curs, since any mutation occurring further back in any such line does not 
appear in the sample. Thus any such line may be thought of as stopping 
at the mutation, as shown in Figure 10.3 (describing the same ancestry as 
that in Figure 10.2). 




* 



n* 



6 7 8 



Figure 10.3. Tracing back to, and stopping at, mutational events 

If at time r there are j ancestors of the n genes in the sample, the 
probability that a defining event occurs in (r, r + 5r) is 

- i)Sr +^j9dr = ^j{j + 0-l)ST, (10.7) 

the first term on the left-hand side arising from the possibility of a coa- 
lescence of two lines of ascent, and the second from the possibility of a 
mutation. 

If a defining event is a coalescence of two lines of ascent, the number 
of lines of ascent clearly decreases by 1. The fact that if a defining event 
arises from a mutation we exclude any further tracing back of the line of 
ascent in which the mutation arose implies that the number of lines of 
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ascent also decreases by 1. Thus at any defining event the number of lines 
of ascent considered in the tracing back process decreases by 1. Given a 
defining event leading to j genes in the ancestry, the Poisson process theory 
of Section 10.2 shows that, going backward in time, the mean time until 
the next defining event occurs is 2/{j(j + 9 — 1)}, and that the same mean 
time applies when we restrict attention to those defining events determined 
by a mutation. 

Thus starting with the original sample and continuing up the ancestry 
until the mutation forming the oldest allele in the sample is reached, we 
find that the mean age of the oldest allele in the sample is 




1 

j(j + 0-1) 



( 10 . 8 ) 



coalescent time units. If the value in (10.8) is multiplied by the time-scale 
factor 2 AT, the resulting expression is identical to that in (9.103). It is 
interesting that this mean was found by looking backward in time, whereas 
(9.103) ultimately derives from a calculation looking forward in time. 

This time backward until the mutation forming the oldest allele in the 
sample, whose mean is given in (10.8), does not necessarily trace back to, 
and past, the most recent common ancestor of the genes in the sample 
(MRCAS), and will do so only if the allelic type of the MRCAS is repre- 
sented in the sample. This observation can be put in quantitative terms 
by comparing the MRCAS given in (10.5) to the expression in (10.8). For 
small 9 , the age of the oldest allele will tend to exceed the time back to 
the MRCAS, while for large 0, the converse will tend to be the case. The 
case 9 — 2 appears to be a borderline one: For this value, the expressions 
in (10.5) and (10.8) differ only by a term of order n ~ 2 . Thus for this value 
of 0, we expect the oldest allele in the sample to have arisen at about the 
same time as the MRCAS. It is for this reason that the value 9 — 2 has 
been used in several calculations given above. 

The competing Poisson process theory of Section 10.2 shows that given 
that a defining event occurs with j genes present in the ancestry, the prob- 
ability that this is a mutation is 9 /(j — 1 + 9). Thus the mean number of 
different allelic types found in the sample is 



E 



3 = 1 




and this is the value given in (3.85). The number of “mutation-caused” 
defining events with j genes present in the ancestry is, of course, either 0 
or 1, and thus the variance of the number of different allelic types found in 
the sample is 
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This expression is easily shown to be identical to the variance formula 
(3.86). 

Even more than this can be said. The probability that exactly k of the 
defining events are “mutation-caused” is clearly proportional to 0 k /{9(9 + 
1) ■ • • (9-\-n— 1)}, the proportionality factor not depending on 9 . Since this 
is true for all possible values of 9 and since the sum of the probabilities 
over k = 1, 2, . . . , n must be 1, the probability distribution of the number 
of different alleles in the sample must be given by (3.84). 

The complete distribution of the allelic configuration in the sample as 
given in (3.83) is not so simply derived. Kingman (1982a) employed the full 
machinery of the coalescent process, together with a combinatorial argu- 
ment considering all possible paths from (f) n to <j> i, to derive (3.83). That is, 
(3.83) derives immediately from, and is best thought of as a consequence 
of, the coalescent properties of the ancestry of the genes in the sample. In- 
deed, it was in an attempt to explain the form of (3.83) through a historical 
argument that led Kingman to the coalescent concept (Kingman (2000)). 

The sample is monomorphic if no mutants occurred in the coalescent 
after the original mutation for the oldest allele. Moving up the coalescent, 
this is the probability that all defining events before this original mutation 
is reached are amalgamations of lines of ascent rather than mutations. The 
probability of this is 



TT - J — = (10 9) 

1 = 1 U + 0 ) (l + 6)(2 + 0)~-(n-l + ey [ ; 

and this agrees, as it must, with the expression in (3.87). 

The results just described were found by moving up the coalescent, that 
is, in reverse real time, rather than down it in forward real time. The 
Hoppe urn process leading to the probability (9.45) in effect describes the 
coalescent moving forward in real time. In the genetic context the “urns” 
probability (9.45) is the probability that the new gene added to the ancestry 
of the sample as the ancestry size increases from j — 1 to j is a new mutant. 
This is identical to the corresponding probability in the coalescent argu- 
ments given above. The urn process was thought of as sampling “through 
space”, but we now think of it as sampling “through time”, adding new 
genes to the ancestry of the sample in forward time. This allows us to find 
all the coalescent-derived results given above. This illustrates an important 
property of the coalescent, that it allows both “forward” and “backward” 
time calculations. This is a substantial benefit, since some calculations are 
more easily carried out moving forward in time, and others are more easily 
carried out moving backward in time. It also implies that computer sim- 
ulation of the coalescent process is easy. Several probability distributions 
relating to samples of genes, discussed in Chapter 11, are difficult to de- 
rive analytically, and are then best found, or at least approximated, by 
simulation using the coalescent process of the sample. 
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The next example concerns the Wright-Fisher infinitely many sites pro- 
cess. The total number of sites S n segregating in the sample is identical 
to the total number of mutations down the coalescent since the MRCAS, 
since in the infinitely many sites model all such mutations are recorded in 
the sample. We consider the (random) time Tj-\ during which there are 
exactly j — 1 lines of descent to the sample. We have shown that if mutation 
is for the moment ignored, the mean of i is 2 /{j(j — 1)}, j = 2, 3, . . . , n. 
In the Wright-Fisher infinitely many sites process the total mutation rate 
is (j — 1)9/2 along the j — 1 lines of descent existing during the time i, 
and this implies that the mean number of mutations to arise during this 
time is 9/ j. Summation over the values j = 1,2, . . . n — 1, gives the mean 
number of segregating sites given in (9.53). This justifies the perceptive 
comment of Watterson referred to below (9.52). 

Further results follow immediately. The Poisson process equation (10.1) 
shows that the distribution of the number of mutations to arise between 
the times when there are j — 1 ancestors of the sample and j ancestors is 

yj - ^ = ^ = 0, 1, 

( 10 . 10 ) 

The distribution of the total number of segregating sites is the distribution 
of the sum of n — 1 random variables, the jth of which has the distribution 
given in (10.10). This confirms the distribution arising from (9.52). 

The complete distribution of S n given in (9.57) may be found (Tavare 
(1984)) from (10.10) by using the recurrence relation 

n 

Prob(S„ = s) = Y, Prob(S n _i = s - i) Q n (i). (10.11) 

i=l 

This recurrence relation shows why the distribution of S n takes the form 
that it does. 



Prob(i mutations) = 



3 ~ 1 



3- 1 



10.6 Coalescent Calculations: Exact Moran Model 
Results 

In this section we find exact results for the Moran model by a coalescent 
argument. The time unit used corresponds to the time between one birth 
and death event and the next. 

As we did for the Wright-Fisher model, we first consider the coalescent 
process itself. Here, however, we use a coalescent theory that is not only 
exact, but that also applies for a sample of any size, and in particular to 
the entire population of genes itself. This implies that all results deriving 
from coalescent theory, for example the topology of the coalescent tree, 
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are identical to corresponding results for the exact Moran model coalescent 
process. 

The Moran model is a birth and death process, and it is convenient to 
think of a gene that does not die in a birth and death event as being its 
own descendant after that event has take place. Consider, then, a sample 
of n genes, where n is not restricted to be small and could be any number 
up to and including the entire population size of 2 N. As we trace back the 
ancestry of these n genes we will encounter a sequence of coalescent events 
reducing the size of the ancestry ton — 1, n — 2 ,... genes and eventually to 
one gene, the most recent common ancestor of the sample. Suppose that in 
this process we have just reached a time when there are exactly j genes in 
this ancestry. These will be “descendants” of j — 1 parental genes if one of 
these parents was chosen to reproduce and the offspring is in the ancestry 
of the sample of n genes. The probability of this event is j(j — l)/(2N) 2 . 
With probability 1 — j(j — 1)/(2N) 2 the number of ancestors remains at 
j. It follows that, as we trace back the ancestry of the genes, the number 
Tj of birth and death events between the times when there are j ancestor 
genes and j — 1 ancestor genes has, exactly, a geometric distribution with 
parameter j(j — l)/(2N) 2 and thus with mean (27V) 2 / {j (j — 1)}. From this, 
the mean of the time Tmrcas until the most recent common ancestor of 
all the genes in the sample is given by 

£(7mrcas) = J2 jy N -i) = ( 2N ) 2 ( x “ CO-12) 

birth and death events. In the particular case n = 2 N this is 

£(Tmrcap) = 2N(2N — 1) (10.13) 



birth and death events. 

Since the various T/s are independent, the variance of Tmrcap is the 
sum of the variances of the T/s. This is 



var(T M RCAs) = 



y, (2AQ 4 

pp 2 (j - 1) 2 



Y" (2AQ 2 

M ~ P 



(10.14) 



The complete distribution of Tmrcap can be found, but the resulting 
expression is complicated and is not given here. 

We now introduce mutation. Consider again a sample of n genes and the 
sequence of birth and death events that led to the formation of this sample. 
We again trace back the ancestry of the n genes in the sample, and consider 
some birth and death event when this ancestry contains j — 1 genes. With 
probability j/2N the newborn created in the population at this birth and 
death event is in the ancestry of the sample, and with probability u is a 
mutant. That is, the probability that at this birth and death event a new 
mutant gene is added to the ancestry of the sample is ju/(2N). As for the 
Wright-Fisher model, we trace back upward along the lines of ascent from 
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the sample, and do not trace back any further any line of ascent at a time 
when a new mutant arises in that line, so that at any mutation, the number 
of lines of ascent that we consider decreases by 1 . 

A further decrease can occur from a coalescence for which the addition 
of a newborn to the ancestry of the sample does not produce a mutant 
offspring gene. If at any time there are j lines in the ancestry, the probability 
of a coalescence not arising from a mutant newborn is j(j — 1)(1 — u) / (2 N) 2 . 

It follows from the above that the number of lines of ascent from the 
sample will decrease from j to j — 1 at some birth and death event with 
total probability 



ju j(j - 1)(1 - u) = 2 Nju + j(j- l)(l-u) 
2 N + (2JV) 2 (27V) 2 



(10.15) 



and we write the left-hand side as Vj + Wj, where Vj and Wj are defined 
in (9.106). The number of birth and death events until a decrease in the 
number of lines of ascent from j to j — 1 follows a geometric distribution 
with parameter Vj + Wj. It follows from the theory of Section 10.2 that 
the mean number of birth and death events until the number of lines of 
ascent decreases from j to j — 1 is 1 /(vj + iUj), and that this mean applies 
whatever the reason for the decrease. Tracing back to the mutation forming 
the oldest allele in the sample, we see that the mean age of this oldest allele 
is, exactly, 



n 



E 



1 

Vj + VJ 3 ’ 



(10.16) 



and this is precisely the expression (9.105). 

The probability that a decrease in the number of ancestral lines from j 
to j — 1, given that such a decrease occurs, is Vj / (vj +Wj), or 0/(j — 1 + 0) 
if 9 is defined as 2Nu/(l — u). The mean number of different alleles in the 
sample is thus, exactly, 



E — - 

■“ 7 — 1 



3 = 1 ' 



+ 0 ’ 



(10.17) 



as given by (3.85). Extending this argument as for the Wright-Fisher case, 
the exact distribution of the number of alleles in the sample is found to be 
given by (3.84), as expected. 

The complete distribution of the sample allelic configuration, as with the 
Wright-Fisher model, requires a full description of the coalescent process. 
This full description is very similar to the approach of Trajstman (1974) in 
arriving at the exact distribution (3.97). 

The argument just used, while expressed as one concerning a sample 
of genes, applies equally for the entire population of genes. This occurs 
because, even in the entire population, at most one coalescent event can 
occur at each birth and death event. Thus all the exact sample Moran 
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model results found by coalescent arguments apply for the population as a 
whole, with n being replaced by 2N . This explains the identity of the form 
of many exact Moran model sample and population formulas, especially 
those in Section 9.9. 

The above Moran model analysis refers to the infinitely many alleles 
model. A parallel analysis can be used to find the various exact infinitely 
many sites model results, and also to explain the identity in form between 
sample and population formulas. Of these, the most important is the dis- 
tribution (9.57) for the number of segregating sites in the sample: This is 
exact in the Moran model with the definition 9 — 2Nu/(l — u). If n is 
replaced by 2 iV, the same formula gives, exactly, the distribution of the 
number of segregating sites in the population. 



10.7 General Comments 

In this brief section we make some general comments about the approximate 
Wright-Fisher and the exact Moran coalescent processes. 

First, the coalescent concept is connected to the partition structure re- 
quirement in (9.47). This requirement was originally given as a reasonable 
one concerning the effects of losing one gene from a sample. But as we 
move backward from a sample up the coalescent, we lose genes one by one 
as coalescent events occur, and the consistency requirement (9.47) then 
states that the same sampling structure must apply at all times in the 
past, that is, at whatever time a sample was taken, and whatever size the 
sample. The noninterference requirement (9.48) has the natural interpreta- 
tion in the coalescent context that if, moving forward in time, one “branch” 
of the coalescent is lost, the properties of the remaining branches remain 
unchanged except for a change in sample size. 

Second, we have applied coalescent theory to provide approximate results 
for the Wright-Fisher model. However, the coalescent process was originally 
developed for the more general Cannings model, and all the approximate 
Wright-Fisher formulae apply with a suitable change in the definition of 9 
or of the time scale. 

Third, the calculations for both the Wright-Fisher model and the Moran 
model show that the mean time until the most recent common ancestor of 
the genes in a sample is almost independent of the sample size, provided 
that this is not too small, and that about half of this mean time arises 
from the coalescing of the penultimate two genes in the ancestry to the 
final one gene. This indicates that the general shape of the coalescent tree 
is one for which the long branches tend to arise in the early ancestry of the 
sample. However, this conclusion depends critically on the assumption that 
the population size remains constant, and for cases of increasing population 
size, a quite different tree shape can be expected. Theory is now available 
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to handle more general cases in which the population size varies: See in 
particular Donnelly and Tavare (1995) for a review of this (and other) 
aspects of coalescent theory. 

The fourth remark follows from the previous one, and in particular from 
the observation concerning the shape of the coalescent tree. In the Wright- 
Fisher model, several coalescent results, when applied formally for the 
population rather than a sample, appear to be more accurate than we 
initially have a right to expect. As one example, the mean age of the oldest 
allele in the population, given by (9.5), has the same form as the coalescent- 
derived formula (10.8) once allowance is made for the coalescent time scale, 
with n replaced by 2iV in the summation. Second, if the sample size n in 
(10.9) is replaced by the population number 2 AT, the heuristic value given 
in (5.69) is obtained. But it was shown in Section 5.7 that this provides an 
excellent approximation to the probability of population monomorphism. 

Fifth, the theory in this chapter covers only the simple cases of coales- 
cent processes, assuming, for example, selective neutrality and a constant 
population size. Many extensions of the theory, covering cases involving se- 
lection, recombination, and geographical features already exist. It is not our 
purpose here to discuss these here, and these extensions will be discussed 
in detail in Volume II. 

Sixth, it is clear that the coalescent lends itself to efficient simulations, 
either moving forward in real time (using, perhaps, the Hoppe urn), or 
moving backward in time. Indeed, its suitability for rapid simulation is 
perhaps one of its most important characteristics. In the following chapter 
various tests of selective neutrality will be described. The null hypothesis 
(neutrality) probability distribution of some of the statistics used in these 
tests is not easily arrived at analytically, and can often be found only 
empirically, using simulations based on the coalescent process. 

Finally, and most important, the coalescent provides a beautiful frame- 
work in which to understand many properties of genetic populations and 
to arrive quickly at formulas that are less easily found by other methods. 
It also provides the framework for many inferential processes in population 
genetics. This is perhaps not surprising, since the initial motivation for its 
development arose from inferential questions. The influence of the coales- 
cent on population genetics generally cannot be overestimated. Further, it 
provides perhaps the closest fink between the merging fields of classical evo- 
lutionary population genetics and human genetics, as is discussed briefly 
in the following section. 



10.8 The Coalescent and Human Genetics 

One of the main aims of research in human genetics is to map, or locate, 
disease genes, using single nucleotide polymorphisms (SNPs) of known loca- 
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tion. More specifically, the aim is to assess whether there is any association 
between the two nucleotides possessed by any individual for this SNP and 
that individual’s disease status. A typical approach is that of a case-control 
study, in which a contingency table is formed with the disease status (af- 
fected or not affected) of each person in the sample considered indicated by 
the rows of a contingency table, and the SNP characteristic of each person 
indicated by the columns in the table. A chi-square test is then carried out 
to test for association between disease status and SNP constitution. 

The logic behind this approach is as follows. The SNP is assumed to 
be quite old and selectively neutral, whereas the original mutation causing 
the disease is thought to be comparatively recent, perhaps arising only 
a few thousand years ago. Suppose that the site of a single nucleotide 
polymorphism and the disease locus are on the same chromosome. Then 
the original mutation will have arisen on a gamete containing one of the 
two nucleotides segregating at the polymorphic site. At that time there is 
the strongest possible strong association, or linkage disequilibrium, between 
the nucleotides at the site and the alleles at the disease locus. 

This linkage disequilibrium will break down over time, because of re- 
combination between the site and the disease locus, following an equation 
of the form of (2.85), amended if necessary, using the recurrence relations 
(2.94), to allow for selection at the disease locus. However, if the site and 
the disease locus are very close, the linkage disequilibrium between the dis- 
ease locus and the site will be retained for many generations, leading to 
an association that might be picked up in the present-day data by the chi- 
square procedure outlined above. On the other hand, if the disease locus 
and the site are not closely linked, the linkage disequilibrium between the 
alleles at the disease locus and the nucleotides segregating at the site will 
rapidly break down, and no significant association between the two loci 
should be observed. Because of this, the chi-square test for association in 
a “case-control” study is a surrogate for a linkage test. 

There is a potential problem with this procedure. It was shown in Section 
8.4 that linkage disequilibrium can arise with unlinked loci if the population 
of interest exhibits geographical structure. Thus establishing a significant 
association between disease status and the nucleotides carried at some site 
does not automatically imply that the site is linked, let alone closely linked, 
to the disease locus. Tests using the association concept, but which over- 
come this problem, have been developed in the human genetics literature. 
Our main interest here is not in these tests, so we do discuss them further, 
other than to note the awareness in the human genetics literature of the 
importance of population structure. 

If there was only one originating disease mutation, all disease genes in the 
sample trace back to it in a coalescent process. However, our observations 
come from the polymorphic nucleotide site, and because of recombination, 
the disease locus coalescent process might well differ from the coalescent 
process at this site. Coalescent processes of two loci with recombination, 
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to be discussed in Volume II, are then needed to assess the extent to 
which marker site data can be used to infer properties of the disease locus 
coalescent process. 

Here we consider properties of the coalescent process at the disease locus 
on its own, without further reference to data arising at a site linked to the 
disease locus. One important feature of the form of data used for testing 
for linkage between a segregating site and the disease locus is that it is 
not obtained by random sampling: The disease gene will be at a far higher 
frequency in the individuals from whom the data are obtained than in 
the population at large. It follows from this that a conditional coalescent 
theory is needed rather than the unconditional theory outlined earlier in 
this chapter. The properties of a conditional process differ considerably 
from those of an unconditional process. These have been investigated by 
Wiuf and Donnelly (1999), and we now outline some of their results. 

Suppose that there are i disease genes and n — i normal genes in a sample 
of n genes. If only one originating disease mutation occurred in the ancestry 
of the sample, the i disease genes trace their ancestry back to a common 
ancestor gene that is not the ancestor gene of any of the n — i normal 
genes in the sample. The disease mutation must have occurred either in 
that common ancestor or in some ancestor gene of it. In the latter case it 
must have occurred later, in real time, than the coalescence of that common 
ancestor and the n — i normal genes in the contemporary sample. 

Wiuf and Donnelly focus on the estimation of the time T back to the 
original disease mutation. They approach the analysis by first finding the 
probability Q(i) that in a sample of n genes, exactly i have a most recent 
common ancestor gene that is not the ancestor gene of any of the remaining 
n — i genes in the sample. (An ancestry of this form is similar to the concept 
of a monophyletic group, discussed in Section 12.4.) They find that 



Q(i) = 



2 (i — 1 )!(n — i)! 
(i + 1 )(n - 1)! 



(10.18) 



This probability is very small when i is approximately half the value of 
n, as is likely in a sample used for testing for linkage. Nevertheless, the 
event that exactly i have a most recent common ancestor gene that is not 
the ancestor gene of any of the remaining n — i genes did occur, so that 
calculations for the conditional coalescent process to be analyzed can be 
expected to differ substantially from those of a standard coalescent, where 
no conditioning is assumed. 

The next calculation relevant to their analysis concerns the probability 
distribution of the random variable Y, where Y is the number of ancestor 
genes of the normal genes in the sample at the time of the most recent 
common ancestor of the disease genes in the sample. Wiuf and Donnelly 
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find that 

(n-y- 2\ /3/+l\ 

Prob (Y = y)= y j - 2 n ; 2 (10.19) 

U+J 

This distribution has strong asymmetry properties. For example, when 
n = 6 and i — 3, the possible values 1, 2, and 3 for Y have respective 
probabilities 1/5, 2/5, and 2/5. This asymmetry arises, of course, from the 
conditional nature of the process examined. 

Wiuf and Donnelly then use these results, together with a competing 
Poisson process argument, to find a limiting conditional distribution for T, 
the limit being taken as the disease mutation rate approaches 0. It is found 
that as n — > +oc with ijn — f fixed, the mean value of T approaches the 
value 

E (T) = (10.20) 

coalescent time units. In the case / = 1/2 this is about 1.38 time units, 
rather less than the unconditional value of about 2 time units established 
above for the coalescence of the ancestry of a sample of genes to their most 
recent common ancestor. With / replaced by 1 — p, it is, however, identical 
to the conditional mean fixation time given in (5.34). This identity is not 
a coincidence, and it can be established by an argument using reversibility 
and conditional mean fixation times. 




11 

Looking Backward: Testing the 
Neutral Theory 



11.1 Introduction 

The coalescent theory described in the previous chapter assumes selective 
neutrality at the gene locus or nucleotide site considered. In this chapter 
we shall consider the question, May we in fact reasonably assume selective 
neutrality at this gene locus or nucleotide site? 

The hypothesis of selective neutrality is more frequently called the “non- 
Darwinian” theory, and was promoted mainly by Kimura (1968). Under 
this theory it is claimed that, whereas the gene substitutions responsible 
for obviously adaptive and progressive phenomena are clearly selective, 
there exists a further class of gene substitutions, perhaps in number far 
exceeding those directed by selection, that have occurred purely by chance 
stochastic processes. Stochastic changes in gene frequency have been stud- 
ied extensively in this book, and they can certainly lead to substitutions 
in which the replacing gene has no selective advantage over the replaced 
gene. A better name for the theory would thus be the “extra-Darwinian” 
theory, although here we adhere to the standard expression given above. 

In a broader sense, the theory asserts that a large fraction of currently 
observed genetic variation between and within populations is nonselect ive. 
In this more extreme sense the theory has been described as the “neutral 
alleles” theory, although this term and the term “non-Darwinian” have 
been used interchangeably in the literature and will be so used here. 

This theory has, of course, been controversial, not only among theoreti- 
cians but also among practical geneticists, and the question whether certain 
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specific substitutions have been neutral was argued quite strongly when 
the theory first appeared (cf. King and Jukes (1975), Blundell and Wood 
(1975), Langley and Fitch (1974), and Jukes (1976)). This controversial 
aspect of the theory has largely died down, perhaps because the inferential 
questions now of major interest often focus on comparatively short time 
scales, for which random changes in gene frequency are relatively more im- 
portant than selective changes. Thus for these questions selective neutrality 
may often reasonably be assumed for many loci. On the other hand, tests 
for neutrality and tests associated with neutrality appear frequently in the 
current literature, as some of the material described below demonstrates. 

Throughout this chapter, selective neutrality, the null hypothesis being 
tested, is assumed, so that all calculations give null hypothesis values. In 
broad terms the testing procedures used assess, in one way or another, 
whether the sample data at hand conform reasonably to what is expected 
under this null hypothesis. In all cases, diffusion theory approximations 
are used for the theoretical calculations needed. The specific evolutionary 
model assumed is often not important, since in particular, the conditional 
distributions used in tests based on the infinitely many alleles model are 
independent of the specific model, provided in effect that the requirements 
needed for coalescent theory to apply are met. 

The data used to assess the non-Darwinian hypothesis are the current 
gene frequencies at various loci in a population, DNA and protein sequences 
frequencies and patterns, and differences of gene and sequence frequencies 
and structure between populations and species. Several reviews of statisti- 
cal testing for neutrality have appeared in the literature; see, for example, 
Kreitman (2000) for a general review, and Fay and Wu (2001), Nielsen 
(2001), and Sabeti et al. (2002) for reviews focusing on genomic data. Fay 
and Wu also provide a substantial list of references and recommended read- 
ing. Various testing procedures use different forms of data; see, for example, 
Slatkin (1982), McDonald and Kreitman (1991), Hudson, Kreitman, and 
Aguarde (1987), Sawyer and Hartl (1992), and Li et al. (2003) for recent 
examples of this. 

The tests that we consider focus exclusively on tests using “within popu- 
lation” data whose theoretical background is based on the infinitely many 
alleles and the infinitely many sites theory discussed in Chapter 9. Thus 
we do not consider procedures such as that of Sabeti et al. (2002) that are 
based on haplotype data and the lengths of haplotype blocks. Only com- 
paratively simple cases will be considered in this volume. For example, no 
account will be taken here of geographical subdivision, so that panmixia in 
all populations is implicitly assumed. The extent to which subdivision must 
be taken into account in testing neutrality is not clear. We have seen, as a 
result of the calculations in (3.127) and (3.128), that only a small amount 
of isotropic migration is necessary for a subdivided population to act ef- 
fectively as one large random-mating population. On the other hand, if 
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migration occurs only between adjoining subpopulations, the subdivisional 
structure is more important. 

Stationarity is assumed throughout, as is random mating and a constant 
mutation rate. A constant population size, not varying over generations, is 
also assumed. This assumption is clearly not correct for the human popula- 
tion, and the shape of the coalescent tree for humans is not well described 
by the coalescent theory of Chapter 10. A more complete discussion of neu- 
trality testing should take account of these factors, in particular that of a 
varying population size, and a much broader discussion than that provided 
here will be given in Volume II. 

All the assumptions listed above are quite strong ones, and the the- 
ory outlined below is often applied without any substantial assessment of 
whether they are reasonable for the case at hand. This is unfortunate, 
since the tests of neutrality that we discuss are in effect tests of neutrality 
together with the various simplifying assumptions made in the analysis, 
often not known or overlooked by the investigator. As an example, one of 
the tests outlined below was originally put forward as a test of constancy 
of mutation rate, assuming neutrality, but it may equally well be used as a 
test of neutrality itself, assuming a constant mutation rate. Thus rejection 
of the neutral theory is in effect rejection of this theory together with all 
the assumptions, implicit and explicit, in the analysis. 

Both the infinitely many alleles and the infinitely many sites models are 
used in the neutrality testing procedures. The latter model is, of course, 
appropriate for data consisting of DNA sequences. In the literature, essen- 
tially all of the neutrality testing theory depending on this model relates to 
the case in which only one sequence is considered, normally corresponding 
to one single gene. This theory depends on the general theory for com- 
pletely linked sites investigated in some detail in Chapter 9, and which is 
employed in this chapter in Sections 11.3.2-11.3.6. However, the neutrality 
testing literature abounds with cases in which several genes are consid- 
ered, often unlinked or essentially so. Here a revised theory, not generally 
considered in the literature, is needed: This theory is discussed in Section 
11.3.7. Finally, the case in which the data tested relate to unlinked sites is 
discussed in Section 11.3.8. 

Since all calculations are based on sample data, we discontinue the con- 
vention adopted in Chapter 9 of using suffixes to distinguish between 
population and sample variables. Thus, for example, the random variable 
S n of Chapter 9 is now denoted simply by S. 
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11.2 Testing in the Infinitely Many Alleles Models 

11.2.1 Introduction 

We consider first tests based on the infinitely many alleles model described 
in some detail in Sections 3.6 and 9.3. This model possesses the attractive 
feature that an exact sampling theory is available for it: The sampling 
distribution formulas relevant to these tests are given in (3.83)-(3.85). The 
exact theory arises because the conditional distribution (9.30) deriving from 
(3.83)-(3.85) is free of all unknown parameters and thus can be used for an 
objective test of the neutrality hypothesis. Most tests that take advantage 
of this fact are tests of whether the observed value of some test statistic 
derived from the data in a sample differs significantly from its neutral 
theory expectation as given by the conditional distribution (9.30). Tests 
using the age-dependent extension of this conditional distribution are also 
possible, and will be discussed in Section 11.2.4. 



11.2.2 The Ewens and the Watterson Tests 

The first objective tests of selective neutrality based on the infinitely many 
alleles model were put forward by Ewens (1972) and Watterson (1977a). 
The broad aim of both tests was to assess whether the observed values 
{ai, . . . ,a n } in (9.30) conform reasonably to what is expected under neu- 
trality, that is, under the formula (9.30), given the sample size n and the 
observed number k of alleles in the sample. It is equivalent to use the ob- 
served numbers {ni, . . . , n^} defined in connection with (3.88) and to assess 
whether these conform reasonably to their conditional probability given n 
and £;, namely, 



Yl I 

Prob(n 1 ,n 2 ,...,n fc |fc) = — r— — 1 . (11.1) 

\S^\k\nin 2 ■ ■ ■ n k 

The Ewens and the Watterson testing procedures differ only in the test 
statistic employed. The Ewens method used as test statistic the arbitrar- 
ily chosen information quantity — Ylj=i x j where Xj = Uj/n. The 

reason for this arbitrary choice was that in the standard Neyman-Pearson 
theory of testing statistical hypotheses, the test statistic is found by con- 
sidering the ratio of the probability of the data under the null hypothesis 
(in this case neutrality) to the corresponding probability under the alterna- 
tive hypothesis (here, selection). Since no unique selective scheme exists, no 
unique test statistic is found under this procedure, so that any reasonable 
but nevertheless arbitrary test statistic measuring genetic variation may be 
chosen. The statistic given above satisfies this criterion. 

However, Watterson (1977a) found that for several important selective 
schemes and for small selective values, the conditional probability of the 
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observed sample numbers {rai, . . . , ti*,}, given fc, is of the form 

Prob{n 1 ,...,n t |t}= |si|t|n ^ (1 + A0 + <*?)) . (11.2) 

In this expression, A is defined by 



A = ra((l + 0) 1 -nf(n-h&) ^(n-b 1 + 0) \ (11.3) 

where /, the observed sample homozygosity, is defined as 



k n 2 
n 2 ’ 

3 = 1 



(11.4) 



and (3 is a parameter depending on the nature of the selective scheme Since 
the ratio of (11.2) to the neutral value (11.1) depends on the observations, 
for small (3 at least, only through /, Watterson chose / as the appropriate 
test statistic. This is superior to the information statistic, and we now 
discuss its application to the testing procedure. 

The first step is to establish what values of / will lead to rejection of 
the neutral hypothesis. Clearly, / will tend to be smaller under heterotic 
selection than under neutrality, since this form of selection will tend to 
equalize allele frequencies compared to the neutral case, thus decreasing /. 
Under a deleterious mutations model, where we expect one high-frequency 
“superior” allele and a collection of low-frequency deleterious alleles, / will 
tend to exceed its neutral theory value. Thus neutrality is rejected in favor 
of a heterotic scheme if / is “too small” and in favor of a deleterious alleles 
scheme if / is “too large” . 

To determine how large or small / must be before neutrality is rejected, 
it is necessary to find its neutral theory probability distribution. This may 
be found in principle from (11.1). In practice, difficulties arise with the 
mathematical calculations because of the form of the distribution (11.1), 
and other procedures are needed. 

For any observed data set {ni, . . , ,n*;}, a computer- intensive exact ap- 
proach proceeds by taking n and k as given, and summing the probabilities 
in (11.1) over all those n i, n 2 , . . . , combinations that lead to a value of 
/ more extreme than that determined by the data. This procedure is in- 
creasingly practicable with present-day computers, but will still be difficult 
in practice if an extremely large number of sample points is involved. 

An approximate approach is to use a computer simulation to draw a large 
number of random samples from the distribution in (11.1): Efficient ways 
of doing this are given by Watterson (1978). If a sufficiently large number 
of such samples is drawn, a reliable empirical estimate can be made of 
various significance level points. This has been done by Watterson (1978, 
Table 1), using a method close to that of Hoppe’s urn (discussed in Section 
9.5.3), and his table, expanded by further simulations of Anderson (1978), 
is given in Appendix B. Use of the table in Appendix B, with interpolation 
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Species 


n 


k 


ni 


n 2 


n 3 


n4 


77-5 


n 6 


n 7 


willistoni 


582 


7 


559 


n 


7 


2 


l 


i 


i 


tropicalis 


298 


7 


234 


52 


4 


4 


2 


i 


i 


equinoxalis 


376 


5 


361 


5 


4 


3 


3 






simulans 


308 


7 


91 


76 


70 


57 


12 


i 


i 



Table 11.1. Dros op hila sample data 



Species 


f 


E(/) 


var (/) 


P 


P. 

1 sim 


willistoni 


0.9230 


0.4777 


0.0295 


0.007 


0.009 


tropicalis 


0.6475 


0.4434 


0.0253 


0.130 


0.134 


equinoxalis 


0.9222 


0.5654 


0.0343 


0.036 


0.044 


simulans 


0.2356 


0.4452 


0.0255 




0.044 



Table 11.2. Sample statistics, means, variances, and probabilities for the data of 
Table 11.1. 

for values of k and n not listed, gives probably the most direct and useful 
test of selective neutrality using /. Examples of computing significance 
levels by both the exact method and the simulation method are given in 
Watterson (1978, Table 4). 

The simulation method allows calculation of tables of E(/| k) and var(/| k) 
for various k and n values, and some representative values are given in 
Appendix C. We discuss a possible use of these mean and variance values 
below. 

We illustrate the above methods of testing neutrality by applying them 
to particular data. The data concern numbers and frequencies of different 
alleles at the Esterase- 2 locus in various Drosophila species and are quoted 
by Ewens (1974b) and Watterson (1977a). Since the data were obtained by 
electrophoresis, it is quite possible that the infinitely many alleles model is 
not appropriate for them, so that the calculations and the analysis given 
here are for illustrative purposes only. 

The data are displayed in Table 11.1. For each set of data we compute 
/, the observed homozygosity. Then the exact neutral theory probability 
P (given in Table 11.2) that the homozygosity is more extreme than its 
observed value may be calculated (except for the D. simulans case where 
the computations are prohibitive) . The simulated probabilities P s i m are also 
given in Table 11.2; these are in reasonable agreement with the exact values, 
so that some confidence can be placed in the values listed in Appendix 
B, which were found by simulation. The conclusion that we draw is that 
significant evidence of selection appears to exist in all species except D. 
tropicalis. 

We conclude with three general remarks about the Watterson testing 
procedure. First, the procedure tests for selective neutrality for one gene 
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locus in one species or location only. If data from several different locations 
or species are available, it may be judged inappropriate to carry out a formal 
testing procedure for each data set as has been done above. In such a case 
it might be preferred to calculate, for each data set, the index function 



/ — E(/|fc,n) 
\/var(/|A;,n) ’ 



( 11 . 5 ) 



which measures the deviation of / from its neutral theory mean in standard 
deviation units. A visual comparison of this index function for all the data 
at hand might provide useful evidence on the neutrality question. One 
problem with this procedure is that the distribution of / is not close to 
the normal distribution (see, for example, Figure 1 in Watterson (1978)), 
so that the usual two standard deviation limits, arrived at ultimately from 
the normal distribution, may not be of much value. The values of the index 
function for the four species of Table 11.1 are 2.59, 1.28, 1.93, and —1.31 
respectively. These values agree reasonably with the probability levels in 
Table 11.2 except for the last one: It is clear that values of / falling short of 
the mean are significant at a smaller number of standard deviations than 
those in excess of the mean. This is because of the skewness to the right of 
the distribution of /. 

Second, the Watterson test, as with all tests of selective neutrality, is not 
a powerful one. This lack of power arises from the association between the 
genes in the data due to their common evolutionary history. As a result the 
tests might not reject neutral hypothesis even when appreciable selection 
exists, especially when the sample size is small. 

Finally, the Watterson procedure is framed above as a test of whether 
the observed value of / conforms reasonably to its conditional null hypoth- 
esis distribution, given the observed value k of the number of alleles, and 
the observed value n of the number of genes, in the sample. An equivalent 
procedure is to compare the estimates of 9 derived from (9.32) based on fc, 
and the estimate of 9 derived from (9.35) based on /. If these are compat- 
ible, the neutral theory is accepted. Suppose, for example, that n = 200 
and k = 10. Then the estimate of 9 based on the value k = 10 is found 
from (9.32) to be 2.065. If the same estimate arises from the sample ho- 
mozygosity, then from (9.35) -the sample homozygosity / would be 0.326. 
This is well within a 95% probability range for / when n = 200, k — 10 
(see Appendix B), and is very close to mean of /, given n = 200, k = 10 
(see Appendix C). Thus with k = 10, n = 200, and f = 0.326 the neutral 
theory would be accepted, and this conclusion agrees with that reached 
under the Watterson testing procedure. 

We mention this alternative way of viewing the testing procedure since it 
is very similar in spirit to the Tajima (1989) testing procedure using DNA 
sequence data and the infinitely many sites model, discussed in Section 
11.3. 
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11.2.3 Procedures Based on the Conditional Sample 
Frequency Spectrum 

In this section we outline two procedures based on the sample “frequency 
spectrum”. We have defined Ai as the (random) number of alleles in the 
sample that are represented by exactly i genes. For given k and n, the mean 
value of Ai can be found directly from (9.30); it is given by 



E(Ai\k,n) 



n! \S n k Zl\ 

|S£| 



( 11 . 6 ) 



In this formula the Sj are values of Stirling numbers of the first kind as 
discussed after (3.84). The array of the E(Ai\k, n) values for i = 1, 2, . . . , n 
is the sample conditional mean frequency spectrum, and the corresponding 
array of observed values ai is the observed conditional frequency spectrum. 
The first approach to assessing neutrality that we outline is an informal 
one, consisting of a simple visual comparison of the observed and the ex- 
pected sample frequency spectra. Coyne (1976 provides an illustration of 
this approach. In Coyne’s data, n — 21, k = 10, and 



n\ — ri 2 = • • • = rig = 1 , u\q — 12 . 

Direct use of (3.57) shows that given that k = 10 and n = 21, 

91! I S f21_i i 

( 11 . 7 ) 

and this may be evaluated for i = 1 , 2, . . . , 12, the only possible values in 
this case. A comparison of the observed values and the expected values 
calculated from (11.7) is given in Table 11.3. It appears very difficult to 
maintain the neutral theory in the light of this comparison. 















i 














ai 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


E 


5.2 


2.1 


1.1 


0.7 


0.4 


0.2 


0.1 


0.1 


0.0 


0.0 


0.0 


0.0 


0 


9 


0 


0 


0 


0 


0 


0 


0 


0 


0.0 


0.0 


1 



Table 11.3. Comparison of expected (E) and observed (O) sample frequency 
spectra: Data of Coyne (1976). See text for details. 

A second approach (Ewens, (1973)) provides a formal test of hypothesis, 
but focuses only on the number A\ of singleton alleles in the sample. This 
procedure originally assumed selective neutrality and was used to test for a 
recent increase in the mutation rate. However, it may equally well be used 
as a test of neutrality itself if a constant mutation rate is assumed, espe- 
cially for any test in which the alternative selective hypothesis of interest 
would lead to a large number of singleton alleles. The procedure may be 
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generalized by using as test statistic the total number of singleton, double- 
ton, . . ., j-ton alleles, leading to a test in which the selective alternative 
implies a significantly large number of low-frequency alleles. A parallel pro- 
cedure, using the frequency of the most frequent allele in the data, may also 
be used. 

We describe here only the test based on the number A\ of singleton 
alleles. The total number k of alleles in the sample is taken as given, and 
the test is based on the neutral theory conditional distribution of A \ , given 
k and n. (It is assumed, as is always the case in practice, that n strictly 
exceeds k.) This conditional distribution is independent of 9 and is found 
(Ewens (1973)) from (9.30) to be 



P»b(^=^.n)=g(-l)<- 5 J^L 1 . (H.8) 

Here Sf is a Stirling number as discussed above. The conditional mean 
of A\ is |S£_i|/|S£|, and the distribution (11.8) is approximately Poisson, 
with this mean. This observation enables a rapid approximate assessment of 
whether the number of singleton alleles is a significantly large one, assuming 
selective neutrality. 



11.2.4 Age- Dependent Tests 

The Watterson test of neutrality described above ultimately depends on 
the sampling distribution (3.83) as its basis. This distribution treats all 
alleles on an equal footing, and does not, for example, use age-order infor- 
mation about alleles, even if this information is available. This is confirmed 
by observing that a test statistic equivalent to the Watterson statistic / 
(defined in (11.4)) is the variance-like quantity 



/• = £(”• - 

7=1 



n 

k 



which is a linear function of / and whose significance points are the same 
linear function as those of /. The fact that the Watterson test treats all 
alleles on an equal footing is evident from the definition of /* . For given n 
and k this statistic in effect compares each nj with the conditional mean 
n/k, the same mean being used for each rij. 

However, it is well known that the probability distribution (3.83) predicts 
rather unequal numbers of the rij values. The reason for this is that different 
alleles in a population at any time entered that population at various times 
in the past, and an “old” allele has a greater chance of attaining a high 
frequency than a “new” allele. This raises the possibility that if the age 
order of the alleles in the sample is known, a procedure using this age 
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order, based on the age-ordered analogue (9.98) of (3.83), can be used to 
arrive at a testing procedure more powerful than Watterson’s. 

The observed number of alleles k in a sample of n genes is a sufficient 
statistic for 9 in the age-ordered distribution (9.98), so that the conditional 
distribution of the age-ordered allele numbers n(i),n( 2 ), • • • , n(fc), gi yen n 
and k, is 



(n- 1)! 

l^nK k)(n-(k) + n ( k- 1 )) ' • ' Kfc) + n { k-i) + • • - + n (2) ) 



(11.9) 



This implies that an exact test of the neutrality theory is possible, using 
the observed values of the age-ordered alleles. 

This possibility was investigated in detail by Tavare et al. (1989), 
who considered a number of possible test statistics, each a function of 
n(i),n( 2 ), . . • , npfc). Perhaps the most natural of these is the “age-ordered” 
analogue of /*, namely, 



fa =J2( n (i) - E ( n (i))) > 

i—1 



where E(n^) can be found from (11.9). Perhaps surprisingly, it is found 
that /* has poor properties as a test statistic of neutrality, and further, 
that none of the age-ordered test statistics considered performed better 
than the Watterson statistic, which does not use age-order information. 
Thus, for this model, age-order information does not appear to be useful 
in testing for selective neutrality. 



11.3 Testing in the Infinitely Many Sites Models 

11.3.1 Introduction 

Since the complete nucleotide sequences of genes are now available in large 
numbers, and since these data represent an ultimate state of knowledge 
of the gene, tests of neutrality based on infinitely many alleles theory are 
increasingly only of historical interest, and it is natural to focus instead on 
nucleotide sequence data for testing neutrality. 

The nature of the testing procedure depends on the nature of the se- 
quence data used in the test. In some cases the data consist of a sample of 
n DNA sequences, each being the sequence of one single gene. The theory 
for such data assumes that there is no recombination between the sites 
in the gene considered. The discussion in Sections 11.3.3-11.3.6 considers 
this case. Thus the sampling theory in these sections needed for developing 
tests of neutrality is based largely on the results of Section 9.6, for which 
completely linked segregating sites are assumed. 

In other cases the data analyzed consist of DNA sequences from several 
unlinked genes, with several segregating sites arising within each gene. This 




356 11. Looking Backward: Testing the Neutral Theory 



case is discussed in Section 11.3.7. For data of this form the theory of 
Section 9.6 is needed for those sites within any one gene, but further theory 
is needed for the “between unlinked genes” aspect of the data. 

An even more extreme case arises with data from segregating sites that 
are all unlinked: This case is discussed in Section 11.3.8. Here the theory 
of Section 9.6 is not needed. 

As for tests using infinitely many alleles theory, discussed above, it is 
assumed in all the calculations in this section that selective neutrality holds, 
so that these can be thought of as “null hypothesis” calculations. The 
notation of Section 9.6 is used throughout this section for these calculations. 

It is appropriate to discuss here the broad nature of the testing pro- 
cedures described below, at least those used in Sections 11.3.3 - 11.3.6, 
for which segregating sites with no recombination are assumed. It was re- 
marked in Section 9.6 for such sites that when selective neutrality holds, 
the number S of sites segregating in the sample is not a sufficient statis- 
tic for the central parameter 9 describing the stochastic behavior of the 
evolution of these sequences. Indeed, there is no simple nontrivial sufficient 
statistic for 9 for this case. This implies that no direct analogue of the exact 
infinitely many alleles tests considered in Section 11.2.2 is possible. 

On the other hand, in the infinitely many sites model there are several 
unbiased estimators of 9 when neutrality holds, as discussed below in Sec- 
tion 11.3.2. The basic idea behind all of the tests described in Sections 
11.3.3-11.3.6 is to form a statistic whose numerator is the difference be- 
tween two such unbiased estimators and whose denominator is an estimate 
of the standard deviation of this difference. Although under neutrality these 
two observed values of these estimators should tend to be close, since they 
are both unbiased estimators of the same quantity, under selection they 
should tend to differ, since the estimators on which they are based tend to 
differ under selection, and in predictable ways. Thus values of the statistic 
formed sufficiently far from zero lead to rejection of the neutrality hypoth- 
esis. To find the sampling properties of these statistics it is necessary first 
to discuss properties of the various unbiased estimators of 9 used in them. 

11.3.2 Estimators of 9 

In this section we consider properties of four statistics that in the neutral 
case are all unbiased estimators of the parameter 9. The theory considered 
in this section concerns only the case of completely linked segregating sites. 

The first unbiased estimator of 9 that we consider is that based on the 
number S of segregating sites, namely, the estimator 9s given in (9.59). 
This estimator was discussed in some detail in Section 9.6.2: In particular, 
the variance of this estimator is given in (9.61) for the completely linked 
sites case. 

The second unbiased estimator is based on (9.49). Suppose that the nu- 
cleotide sequences of genes i and j in the sample are compared and differ at 
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some random number T(i, j) of sites. Then T(i, j) is an unbiased estimator 
of 9 . It is natural to consider all (£) possible comparisons of two nucleotide 
sequences in the sample and to form the statistic 

\ 2 ) 

Since this is also an unbiased estimator of 0, we think of it as forming the 
unbiased estimator 0^, defined by 



_ Ei<jT(i,j) 

( 3 ) 



(ii.il) 



This estimator of 0 was proposed by Tajima (1983). It is a poor estimator 
of 0 in that its variance, namely, 



n+ 1 „ 2(n 2 + n + 3) 2 

3 (n - 1) 9n(n - 1) 



M + b 2 e 2 , 



( 11 . 12 ) 



does not approach 0 as the sample size n increases. However, our interest 
here in this estimator is that it forms part of a hypothesis testing procedure, 
and not as a possible estimator of 9. 

The third unbiased estimator of 9 follows from (9.66). This equation 
shows that the mean number of “singleton” sites, that is, sites where one 
nucleotide arises once and another n — 1 times, is n9/(n — 1). If M is the 
observed number of such sites, then clearly, 



9m 



(n — 1 )M 



(11.13) 



is an unbiased estimator of 9. The variance of M is 



6+ 1 ) 6> 2 (11.14) 

n — 1 \n — 1 (n — l) 2 / 



(Fu and Li, 1993), where g\ is defined in (9.54). This implies that the 
variance of 9m is 



n \ / 



(11.15) 



The fourth unbiased estimator of 9 was proposed by Fay and Wu 
(2000). This estimator is based on the assumption that of two segregat- 
ing nucleotides at any site, the mutant nucleotide can be recognized. The 
conditional mutant frequency spectrum (9.65) shows that if there are j 
representatives of this mutant nucleotide at a given site, the mean of j 2 is 
n(n — l)/2g\. The mean number of segregating sites is g\9. We define U 
as the sum of the squares of the various observed numbers of the mutant 
nucleotide at the various segregating sites observed, summed over all seg- 
regating sites. It then follows that the mean value of U is n(n— 1)9/2. This 
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leads to the unbiased estimator Oh , defined by 




(11.16) 



These four estimators have been used to form tests for neutrality, as de- 
scribed in Sections 11.3.3, 11.3.4, and 11.3.5 below, and their properties 
as estimators of 9 are a central part of the test procedures described. It 
was observed in Section 9.6.2 that estimators of 0 having better properties 
than these are possible if historical information is available, or can be un- 
ambiguously inferred, concerning the evolutionary process leading to the 
data observed. This fact suggests that better test procedures might be pos- 
sible if historical information can be employed. This matter is discussed 
further in Section 11.3.4. 



11.3.3 The Tajima Test 

It was observed at the end of Section 11.3 that most tests of neutrality using 
data from completely linked segregating sites depend on the difference of 
two unbiased estimators of 9. By far the most frequently used test based 
on such a difference is that devised by Tajima (1989), which compares the 
values of Ot and 0$, defined respectively in (9.59) and (11.11). Specifically, 
the procedure is carried out in terms of the statistic H, defined by 



Ot - 9 s 

vT 



(11.17) 



where V is an unbiased estimate of the variance of 9t — Os and is defined in 
(11.19) below. Tajima (1989) showed, by using adroit coalescent arguments, 
that the variance V of Ot — 9s is 

= C\9 -(- c 2 $^, (11.18) 

where 

i 1 , n + 2 g 2 

e\ b\ ? c 2 o 2 “L 2 5 

9i a i n 9 1 

with gi and g 2 defined respectively in (9.54) and (9.56) and b\ and 5 2 defined 
implicitly in (11.12). Since this variance depends on #, any estimate of this 
variance depends on a choice of an estimate of 9. 

The variance of the estimator 9s decreases to 0 as the sample size in- 
creases (although the decrease is very slow), so the Tajima procedure is 
to estimate the variance of Ot — Os by the function of S that provides an 
unbiased estimator of the variance (11.18). Elementary statistical theory 
shows that this function is 

cxS t c 2 S(S-l) 

9i 9i + 92 



(11.19) 
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This is then used in the D statistic given in (11.17) above. 

The next problem is to find the null hypothesis distribution of D. Al- 
though D is broadly similar in form to a z- score, it does not have a normal 
distribution and its mean is not zero, nor is its variance 1, since the de- 
nominator of D involves a variance estimate rather than a known variance. 
Further, the distribution of D depends on the value of 0, which is in prac- 
tice unknown. Thus there is no null hypothesis distribution of D invariant 
over all 6 values. 

The Tajima procedure approximates the null hypothesis distribution of 
D in the following way. First, the smallest value that D can take arises 
when there is a singleton nucleotide at each site segregating. In this case 
9t is 25/n, and the numerator in D is then {(2/n) — (l/gi)}S. In this case 
the value of D approaches a, defined by 



a = 



{(2/n)-(l/ gl )}y^+^ 



( 11 . 20 ) 



as the value of 5 approaches infinity. 

The largest value that D can take arises when there are nj 2 nucleotides 
of one type and nj 2 nucleotides of another type at each site (for n even) or 
when there are (n — 1)/2 nucleotides of one type and (n+l)/2 nucleotides of 
another type at each site (for n odd) . In this case the value of D approaches 
6, defined by 



{{n/2(n - 1)) - (l/fli)}y/ffi + 92 
y/C2 



( 11 . 21 ) 



when n is even and the value of S approaches infinity. A similar formula 
applies when n is odd. 

Second, it is assumed, as an approximation, that the mean of D is 0 and 
the variance of D is 1. Finally, it is also assumed that the density function 
of D is the generalized beta distribution over the range (a, 5), defined by 



f(D) = 



r(a + /3)(6-£>) Q - 1 (£»-a) /3 



-1 



T(a)m(b-a) 



<x+(3 - 1 



( 11 . 22 ) 



with the parameters a and (3 chosen so that the mean of D is indeed 0 and 
the variance of D is indeed 1. This leads to the choice 



(1 + ab)b (1 + ab)a 

b — a b — a 



This approximate null hypothesis distribution is then used to assess 
whether any observed value of D is significant. 

The various approximations listed above deserve further comment. First, 
the use of an asymptotically large value of S in the computation of a and b 
is questionable, since the segregating sites are all assumed to arise within 
the same gene. Second, the mean of D is known not to be 0, and the 
variance of D is known not to be 1. Finally, the beta distribution is used 
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as a mathematical convenience rather than because it follows from any 
theoretical considerations. These comments imply that the adequacy of 
the distribution (11.22) as the null hypothesis distribution of D has to be 
examined. 

Tajima (1989) investigated the implications of the various approxima- 
tions listed above for a range of values of n and 0 by simulating the 
distribution of D when neutrality holds. These simulations show that the 
null hypothesis mean of D is negative (but generally small) and the null 
hypothesis variance of D less than 1, typically being in the range 0.72 to 
0.98 for the values of n and 0 considered (n = 5, 10, 20, 30, 0=1, 10, 100). 
The beta distribution approximation is not accurate for very small values 
of n for the case 0 = 1, but appears to be far more accurate for larger 
values of n. This latter result is a desirable feature of the procedure since 
sample sizes less than 20 cannot be expected to provide a test of neutrality 
with any significant power. 

The problem of finding more accurate significance points of D than are 
provided by (11.22) was also addressed by Fu and Li (1993), who used 
simulations with known values of 0 and n to find significance points of D 
empirically. The values of 0 considered ranged from 2 to 20, and the most 
extreme critical value for this range of values was chosen. This approach 
suffers from the problem that this range of values of 9 might not correspond 
to the value of 9 appropriate to the data at hand. 

Simonsen et al. (1995) conducted a detailed examination of the accuracy 
of (11.22) as the null hypothesis distribution of D. Perhaps the main con- 
clusion that they found is that the critical significance points found from 
(11.22) are often too conservative. While a conservative test is less likely to 
reject the null hypothesis incorrectly, it necessarily involves a loss of power, 
so that in this case the Tajima procedure might lead to acceptance of the 
null hypothesis of neutrality when in fact significant selection exists. This 
observation agrees with the fact that the true variance of D is less than 1, 
the value assumed in the distribution (11.22). 

Simonsen et al. (1995) considered an approach that does not depend 
on an arbitrary range of 9 values, as does that of Fu and Li (1993). In 
this approach a 1 — (3 confidence interval (0l,0u) for 0 is found from the 
cumulative form of the Tavare (1984) distribution for S given in (9.58). 
The value of (3 is chosen to be less than the Type I error a used for the 
test of hypothesis of neutrality. Standard statistical theory shows that the 
1 — /3 confidence interval (0l, 0c/) for 0 is found by solving the equations 

F(s - 1, 9 l ) = 1 - (3/ 2, F{s, 9u) = (3/2 (11.23) 

(Simonsen et al. (1995)), with F(s,0) defined in (9.58). They then consid- 
ered a grid of values of 0 in this confidence interval and estimated the a — (3 
significance points of the Tajima statistic for each value of 0 in this grid, 
and then used the maximum upper and the minimum lower significance 
points taken over all values of 0 considered. Statistical theory (Berger and 
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Boos, (1994)) shows that this procedure gives a test of hypothesis of neu- 
trality with Type I error a. Simonsen et al. (1995) then used this procedure 
to arrive at adjusted significance points for the Tajima test. These imply a 
slightly less conservative procedure than the original Tajima test. 

The Tajima procedure applies when complete linkage between sites ob- 
tains, since the coalescent theory used to find the variance of Ot — Os given 
in (11.18) assumes such complete linkage. Under complete linkage the in- 
finitely many sites model reduces to the infinitely many alleles model, so 
that the infinitely many alleles testing theory of Section 11.2 may in princi- 
ple be applied, where distinct alleles are now distinct nucleotide sequences. 
Because the Tajima procedure makes use of the actual DNA sequences it 
may be expected to provide a more efficient testing procedure than that 
based on the infinitely many alleles theory. 



11.3.4 Other “ Tajima-like ” Testing Procedures 

The numerator of the Tajima test statistic is Or — Os . It is also possible to 
form test statistics whose numerators are Ot — Om and Os — Om , where Om 
is defined in (11.13). All three of these differences have a variance of the 
form AO + BO 2 , for some constants A and B depending only on n and the 
difference in question. The variances of Os, Ot , and Om are also quadratic 
functions of 0 (see (9.61), (11.12), and (11.15)), and thus any one of Os , 
Ot , and Om can be used to give an unbiased estimate the variance of any of 
the three differences given above. This implies that there are nine “Tajima- 
like” test statistics possible, of which the Tajima statistic described above 
is one. 

The properties of these nine test statistics have been investigated by 
Simonsen et al. (1995). They all have the properties that their null hypoth- 
esis distributions depend on 0 , and even if 0 were known, and that they all 
have complicated distributions that are best approached through simula- 
tion. The broad conclusion of the investigations of Simonsen et al. is that 
the Tajima statistic has the best operating characteristics of all nine statis- 
tics. Because of this, we do not consider the remaining eight procedures 
further. 

Of the above nine test statistics, three use a variance estimate based 
on S, and these three are the natural ones to investigate in more detail. 
The Tajima statistic (11.17) is one of these three. The other two, with 
numerators Os — Om and 0t~0m, denoted respectively by D* and F*, were 
proposed by Fu and Li (1993) as possible test statistics of the neutrality 
hypothesis. Fu and Li claim that these testing procedures are likely to be 
more powerful than the Tajima procedure, a claim contested by Tajima 
(1997). This matter will require more analysis before a resolution of this 
point can be reached. 
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A comparison not discussed above is that between 9t and 0# , defined 
in (11.16). The difference between these two estimators has been used as 
the basis of a test of neutrality, designed specifically to test against an 
alternative hypothesis of a selective sweep at a locus close to the locus 
under consideration. This test is considered below, in Section 11.3.5. 

It was remarked in Sections 9.6.2 and 11.3.2 that the estimators of 6 used 
in the various tests discussed above are not in principle optimum ones, and 
thus have larger variances than an estimator using historical information. 
This suggests that sharper tests of neutrality might be available if estima- 
tors that use historical information were employed. On the other hand, it 
was shown in Section 11.2.4 that in the infinitely many alleles case, tests 
of neutrality using age-order information do not perform better than tests 
that do not use this information. 

11.3.5 Testing for the Signature of a Selective Sweep 

The statistic used in any hypothesis testing procedure is in principle chosen 
so as to maximize the probability of rejecting the null hypothesis (in this 
case neutrality) in favor of whichever selective alternative is of interest. Ev- 
idence for this selective alternative is provided by some specific “signature” 
in the data. In this section we consider aspects of tests based on a neutral 
locus signature suggesting a recent selective sweep at some selected locus 
linked to this neutral locus, carrying with it the frequency of various alleles 
at the neutral locus. 

We assume a sample of n DNA sequences corresponding to n genes at a 
selectively neutral locus. The tests we consider are all based on the assump- 
tion that of two nucleotides segregating in the sample at any site in the 
neutral gene, the mutant (or derived) nucleotide can be recognized. With 
no recent selective sweep at a locus linked to the neutral locus under con- 
sideration, the mean number of sites at which there are j representatives 
of the derived nucleotide is 0/j, as given by (9.62). Correspondingly, the 
mean number of sites at which the mutant nucleotide assumes a frequency 
in (x,x + <5x) is given, in the continuous approximation, by (9.64). 

Suppose now that a favored new mutant A arises at a locus closely linked 
to the neutral locus, and increases in frequency to 1 in a comparatively brief 
selective sweep. After the selective sweep has concluded, the frequency of 
the mutant nucleotide will tend to be high for those mutant nucleotides 
“hitchhiking” with the favored allele at the selective locus, or to be low 
for those mutant nucleotides not hitchhiking with the favored allele. The 
probability that any given mutant hitchhikes is the probability that the 
favored mutant arises on a gamete containing the mutant nucleotide, and 
this is the frequency x of the mutant nucleotide before the selective sweep. 
The probability of this frequency x is proportional to x” 1 , as shown by the 
mutant nucleotide frequency spectrum (9.64). This will lead to a population 
frequency spectrum after the selective sweep different from the expression 
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4>(x) = 0/x given in (9.18). This new frequency depends on the Maynard 
Smith and Haigh (1974) quantity c defined in (6.98), and is given by Fay 
and Wu (2000) as 

= ^N- X< °' (1L24) 

4>(x) = -, \ — c < x <1 — (11.25) 

c 2D! 

<j)(x) =0, c < x < 1 — c. (11.26) 

Correspondingly, the selective sweep will tend to lead to an estimator Oh 
of 6 based on this new frequency spectrum that will tend to be different 
from that based on the expression in (11.16). Fay and Wu (2000) then form 
a test for such a recent sweep based on the difference H of the estimators 
Oh and 0t , using as null hypothesis the quantity if, defined by 



_ 0 H ~ 0t 



(11.27) 



where V is an unbiased estimator of the variance of Oh — 0t- The null hy- 
pothesis to be tested is that no selective sweep such as that described has 
recently occurred, and the null hypothesis distribution of H was found by 
Fay and Wu by simulation, using the coalescent. This allows an assessment, 
for any particular case, of whether evidence exists for a recent hitchhiking 
event at a selected locus close to the neutral locus considered. Power prop- 
erties of this procedure are investigated by Przeworski (2002). A similar 
procedure, again using (11.24), is provided by Kim and Stephan (2002). 

The Fay and Wu (2000) procedure is an example of a procedure using 
as test statistic a quantity of the form 



01-02 
vT ’ 



(11.28) 



where V is an estimate of the variance of 0\ — 02- In the Fay and Wu 
procedure 0\ = 0h-> #2 = #t and V is an estimate of the variance of 0h — 0t- 
The Tajima statistic (11.17) is also a case of a statistic of the form given 
in (11.28), with (in that case) 0\ = 0t, 0 2 = 0s- Fu (1997) considered a 
variety of test statistics of the form given in (11.28). He focused attention 
on those cases in which 0\ and 0 2 are linear functions of the form 

01 = £ 02 = £ Pi Y i> ( 1L29 ) 

where Xj is the number of segregating sites in the sample for which there 
are j representatives of the mutant nucleotide (assuming that this can be 
recognized) and Yj is the number of segregating sites in the sample for which 
there are j representatives of either the mutant or the original nucleotide. 
The means of Xj and Yj are given by (9.62) and (9.66). The constants {oy } 
and {f3j} are required to be chosen so that the neutral theory stationary 
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means of 6 \ and 62 are both equal to so that the neutral theory stationary 
mean value of the numerator in (11.28) is 0. The estimators 6 s, Oh, and 6 t 
considered above satisfy these requirements, so that the Fay and Wu and 
the Tajima statistics are of the required form. 

Fu (1997) then considerd various statistics of the form of (11.28), as- 
sessing their properties as test statistics for testing for hitchhiking from 
a selective sweep at a linked selected locus and also for testing for recent 
population growth. Once again, the coalescent is used to find empirical sig- 
nificance points of these statistics, leading to a comparison of their power 
properties for these tests. 

If a hitchhiking event did in fact occur, can we estimate the time since it 
concluded? Perlitz and Stephan (1997) suggest an approach to estimating 
this time that depends on the assumption that 6 is known, or at least can be 
estimated or assumed to lie in some interval of values. The mean number of 
segregating sites in a sample of n genes if there was no selective sweep in the 
recent past is given by the stationary value (9.53). One possible explanation 
for observing a number of segregating sites less than this mean is that a 
hitchhiking event concluded recently in the past and that the actual number 
of segregating sites has not had time to achieve a value close to its stationary 
value. Perlitz and Stephan (1997) find an expression for the mean number 
of segregating sites in the sample of n genes, given that a hitchhiking event 
occurred at time t in the past. This is a monotonically increasing function 
of £, as would be expected, and by equating this expected value to the 
observed number of segregating sites in the sample, an estimate of t may 
be found. 



11.3.6 Combining Infinitely Many Alleles and Infinitely 
Many Sites Approaches 

Strobeck (1987) proposed a test for population subdivision in the presence 
of neutrality that may equally be used as a test for neutrality in a random- 
mating population. The probability distribution of the. number K of allelic 
types in the infinitely many alleles model is given by (3.84), and thus 

I qi I cti 

Prob (K <k)= ^ ml - ■ (11.30) 

We denote the left-hand side in (11.30) by T(K). If K were a contin- 
uous random variable, statistical theory would show that T(K) has a 
uniform distribution in (0, 1), so that, for example, for any value a in (0, 1), 
Prob(T(#) < a) = a. 

The concept behind the Strobeck procedure is that in subdivided popu- 
lations, the value of the infinitely many alleles estimate 6 k given in (9.32) 
should differ from the value of the Tajima infinitely many sites estimate 
6t- Strobeck then suggested that in the neutral case, a suitable statistic to 
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test for population subdivision is 



Eli IslWr 

SniOr) 



(11.31) 



and that the random-mating hypothesis be rejected (with Type Terror a) 
if the value of this statistic is less than a. The corresponding procedure, 
in a random-mating population, would be to reject neutrality in favor of a 
selective alternative if the statistic were less than a. 

Fu (1996) showed that this procedure does not provide a test with Type 
I error a. Nevertheless, he used it as a basis for a test of neutrality using 
as test statistic the quantity TF, defined by 

j qi I sji 

W = (11.32) 

S n (6 s ) 



where 0 S is the estimate of 0 given by (9.60), found from the number of 
segregating sites in the sample. 

The statistic W differs from the Strobeck statistic only in the estimate 
of 0 used. Just as the Strobeck statistic does not have an approximately 
uniform distribution under neutrality, so also W does not have this distri- 
bution under neutrality. Recognizing this, Fu found an approximate neutral 
theory distribution for W using a logistic regression technique. We do not 
enter into the details, since complications arise (as for all infinitely many 
sites tests) because the distribution of W depends on 0, so that procedures 
similar to those carried out by Simonsen et al. (1995) described in Section 
11.3.3 are needed. 

Fu (1997) has described a procedure using a statistic similar to the 
Strobeck statistic (11.31), but defined instead by 



E n I Qi 
_ i=k 

Sn(0 T ) 



(11.33) 



If there are many rare alleles in the sample, 0t will tend to be less than Os , 
and as a result, Q will tend to be small and F, defined as log(<2/ (1— <2)), will 
tend to be large and negative. Fu (1997) therefore chose F as an appropriate 
test statistic, aimed specifically at testing for a significantly large number 
of low-frequency alleles. (This procedure thus has the same aim as that 
discussed at the end of Section 11.2.3.) One purpose of this test procedure 
is as a test hitchhiking alternative to that discussed in the previous section. 



11.3.7 Data from Several Unlinked Loci 

All the procedures described in Sections 11.3.3-11.3.6 assume data from 
completely linked sites. In practice, data are often analyzed for several 
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genes, often unlinked, with several sites segregating within each gene. The 
data of Takano et al. (1991), for example, analyzed by Tajima (1997), 
are of this type. These data refer to four essentially unlinked genes (Adh, 
Amy , Pu , and Gpdh ) in two populations (north and south) of Drosophila 
melanogaster. Sites within each gene may be taken as completely linked, but 
sites in different genes may be taken to be unlinked and to have independent 
evolutionary behavior. 

It is necessary, for data of this form, to consider further properties of the 
three estimators 9s, 6 t , and 9m beyond those discussed in Section 11.3.2, 
and as a result to find the way in which the neutral hypothesis is to be 
tested for data from several unlinked genes. 

Tajima (1997) considered both these questions. When data arise from 
several genes, the definition of 9 now involves the total mutation rate taken 
over all sites in all genes considered. We call this 9 sum . This is the sum of 
the various individual 9 values for the separate genes, so it is appropriate 
to simply sum the individual gene estimators of 9 to obtain an estimator of 
^sum • This can be done for all of the estimators of 9 considered in Section 
11.3.2. 

For data deriving only from any one gene and location, the variance for- 
mulas for the three “single gene” estimators given in (9.61), (11.12), and 
(11.15) are appropriate, since these variances are calculated under the as- 
sumption of completely linked sites. However, these variance formulas are 
not appropriate for estimators of 0 sum , since the complete linkage assump- 
tion under which they are derived is no longer appropriate. Tajima (1997) 
correctly uses the result that the variance of any estimator of 9 sum is the 
sum of the variances of the “separate gene” estimators of the “separate 
gene” 9 values, each of which is given by the theory in Section 11.3.2. 

As far as the data of Takano et al. (1991) are concerned, it is interesting 
that even for the same gene at the same location, the numerical values of 
the estimates of 9 often disagree considerably. This might arise because of 
random fluctuations arising in a neutral case from the very small sample 
size (n = 43) or because, in a selective case, selection affects the three 
estimators differently. 

This comment leads to a discussion of the revisions needed to the tests 
of neutrality discussed in Sections 11.3.3 and 11.3.4, and in particular, to 
revisions needed for the calculation of the statistics D, D*, and F*. The 
tests described above are not directly appropriate for data pooled over 
several genes, since the variance formulas assumed in the statistics are no 
longer correct when data from unlinked sites are considered. However, they 
are easily amended, in the following way. 

The numerator in each of the revised statistics following the general 
form of Z>, D*, and F* can be written as J^( 9i — 9j), where 9i and 9j 
are two different estimators of 9 found from the segregating sites within 
one gene, and the sum is taken over all genes in the sample. Because the 
different genes in the sample are assumed to be unlinked and thus to evolve 
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independently, the variance of any such sum is the sum, over genes, of the 
variances of the individual gene 9i - 9j values. These variances are found 
from the single gene theory of Section 11.3.2. Denoting the sum of the 
corresponding variance estimates by V*j, we obtain “Tajima-like” statistics 
of the form 



UOj-Qj) 



( 11 . 34 ) 



As an approximation, the distribution of this statistic can be taken as being 
close to that of the approximate distribution discussed above for the Tajima 
statistic. 

Although it is not explicitly stated, this appears to be the procedure 
adopted by Tajima (1997), since his calculated values of test statistics of 
the form (11.34) agree well with those deriving from (11.34). 

For the data of Takano et al. (1991), the values of the three test statistics 
given by (11.34), for the choices ij — TS , ij — TM , and ij — SM, usually 
agree in sign but often disagree in numerical value. Further, they usually 
agree in sign for the four different genes considered. As was the case for the 
estimation of 0, this might arise because of random effects, given the very 
small sample size, or because, in a selective case, the different statistics are 
sensitive to different forms of selection. 

The fact that the values of the test statistics usually agree in sign for 
the four genes considered raises an important point. It has been remarked 
several times above that tests of neutrality are in effect tests of neutrality 
together with the various often implicit assumptions made in the testing 
procedure. One of the latter assumptions, for example, is that there have 
not been any recent population size bottlenecks in the recent past. The 
effect of a recent bottleneck mimics the effect of selection. Thus if the values 
of a test statistic for selection show a consistent deviation from zero across 
a number of different genes, a plausible explanation is that the deviation is 
caused by a bottleneck, affecting all genes equally, rather than by selection. 
Testing for this form of explanation rather than selection will be discussed 
in Volume II. 

A further consideration associated with this is that the procedure using 
a test statistic of the form of (11.34) tests for overall selection over all gene 
loci considered. This might, however, not be an interesting test to perform. 
Further, an overall test such as this might mask selection if selection does 
act at the different gene loci, but causes negative values of §i — 6j at some 
loci and positive values at other loci. It is clear from this and the discussion 
of the previous paragraph that testing procedures using several gene loci 
together must be conducted with some care. 
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11.3.8 Data from Unlinked Sites 



A situation even more extreme than that considered in Section 11.3.7 arises 
when the data arise from a number of unlinked segregating sites. While it 
is possible to extend the theory of Section 11.3.7 and to devise further 
“Tajima-like” testing procedures, for the case of unlinked sites a better 
approach is possible. For this case the number of segregating sites S is a 
sufficient statistic for 0, and thus a conditional test, similar in spirit to 
those used for infinitely many alleles data, can be used. In this case the 
test statistic may be taken as some function of the frequencies of the two 
nucleotides at any segregating site. 

We assume that there are n nucleotides at each of s segregating sites in 
the sample. The probability distribution (9.63) of the number j of times 
that the mutant nucleotide is observed at any one of these sites shows that 
the mean and variance of the total number of times H = YhJ that the 
mutant nucleotide arises at the various sites are, respectively, 



nn 



s(n — 1) 
9i 



and a 2 H = 



sn(n — 1) 

2<?i 



s(n-l ) \ 2 

9i ) 



(11.35) 



If the mutant nucleotide at each site can be recognized, a z-like statistic of 
the form (h — //#) /ajj can be formed and used to test for neutrality, where 
h is the observed value of H. 

If it is unknown which of two segregating nucleotides is the mutant, it is 
necessary to amend this procedure and use as test statistic a quantity that 
remains unaltered if j is replaced by n — j. One possibility is to replace 
the statistic H by the statistic K = — the sum being taken over 

all the s sites segregating in the sample. The mean and variance of this 
statistic are 






sn(n — 1) 
291 



and 



°K 



sn 2 (n 2 — 1) 

Wi 






Once again, a z-like statistic can be formed and used to test for neutrality. 

The diffusion approximation for this procedure is described by Ewens, 
(1979, Section 9.8). In the diffusion approximation, the distribution of j/n 
may be written as 



f(x) = x 1 {log(n — 1)} x , n 1 <x<l-n 1 . (11.37) 



If it is unknown which of two segregating nucleotides is the mutant, it is 
necessary to use as test statistic some function that remains unaltered if x 
is replaced by 1 — x. The most convenient such statistic is 



|log{(l -x)/x}\ 
log(n - 1) 



(11.38) 



Under the approximation made above, this statistic has a uniform distribu- 
tion (0, 1) under the hypothesis of selective neutrality. Alternatively, under 
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neutrality, 

y = —2logw (11.39) 

has a chi-square distribution with 2 degrees of freedom. If we are interested 
in heterotic selection, we reject the neutrality hypothesis for significantly 
large values of y. A suitable test statistic of neutrality against a heterosis 
alternative would be which under neutrality has a chi-square distri- 

bution with 2s degrees of freedom, where s is the observed number of sites 
segregating in the sample. 

With data such as those of Takano et al. (1991), for which sites within 
any one gene can be taken as completely linked and sites in different genes 
can be taken as unlinked, this procedure is not valid since the site-to-site 
independence assumption implicit in it does not then apply. One approach 
to this problem is to continue to use as test statistic, but to find 
its null hypothesis distribution by simulation, using (as for several of the 
statistics considered above) a simulated coalescent process. 

An informal procedure for assessing neutrality parallel to the method of 
Coyne (1976) discussed in Section 11.2.3 is possible whether sites are linked 
or unlinked. In this procedure a comparison is made between the observed 
number of sites at which there are j nucleotides of one type and n — j of 
another with values given by the conditional frequency spectrum (9.67), 
the conditioning being on the observed value s of S. In this comparison 
allowance must be made for the fact that the sum of the terms in this 
conditional frequency spectrum is 2s, since it is not known at each site 
which is the “original” nucleotide and which is the mutant. An example 
of the comparison of the observed conditional frequency spectrum and the 
neutral theory expected conditional frequency spectrum for the data of 
Takano et al. (1991) is given in Figures 3 and 4 of Tajima (1997). This 
shows in a useful visual way how the observed values of J differ from their 
neutral theory mean values. 

11.3.9 Tests Based on Historical Features 

It was mentioned at the end of Section 9.6.2 that procedures for estimating 
9 in the infinitely many sites model that rely on historical features are 
currently under intense investigation. These procedures were to some extent 
motivated by the corresponding attempt to devise tests of neutrality based 
on historical features, using the mechanism of the coalescent, initiated by 
Fu and Li (1993) and Fu (1996). These tests are more complicated than 
those described above, and research on them still continues, so the details 
of these procedures will be discussed in Volume II. 




12 

Looking Backward in Time: 
Population and Species Comparisons 



12.1 Introduction 

Perhaps the best-known retrospective activity in evolutionary genetics is 
the reconstruction (more accurately, estimation) of the phylogenetic tree 
of a collection of contemporary populations or species, given genetic data 
from these populations or species. In this chapter we consider stochastic 
processes describing, with greater or lesser accuracy, the evolution of the 
genetic constitution of several populations or species, all descended from a 
common ancestor population or species, in order to carry out this estima- 
tion procedure. We shall use the expression “different population” in this 
analysis, taking this to mean different species if appropriate. 

In this activity we consider a far longer time scale than that considered 
in previous chapters. For example, we have previously considered aspects of 
the time until one allele substitutes for another in some population. In the 
phylogenetic tree estimation process we suppose, because of the far longer 
time scale considered, that these substitutions are in effect instantaneous. 

The data used for the phylogenetic tree estimation normally consists of 
DNA sequences, so the analysis in this chapter is based on the infinitely 
many sites model appropriate for these sequences. In previous chapters we 
have examined aspects of the nature of the variation of DNA sequences 
within a population. However, there is comparatively little variation at the 
nucleotide level between members of the same population. For example, 
two randomly chosen humans typically have different nucleotides at only 
one site in about 500 to 1000. To a sufficient level of approximation it 
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is reasonable to assume, for the great majority of sites, that a single nu- 
cleotide predominates in the population. (If this were not so, the concept of 
a paradigm “human genome” for our own species would be meaningless.) 
Thus in the analysis below we assume a situation close to genetic uniformity 
within any population, and then use expressions like “the nucleotide at a 
given site in a population” rather than the more precise “the predominant 
nucleotide at a given site in a population” . 

Several popular phylogenetic tree estimation processes are purely algo- 
rithmic. That is, they start with DNA sequences from the populations 
of interest and by purely algorithmic processes estimate a phylogenetic 
tree from these sequences. The neighbor-joining and parsimony processes 
are two frequently used algorithmic procedures. The mechanistic aspect of 
these processes often leads to the expression “tree reconstruction” rather 
than the more correct “tree estimation” , the latter expression recognizing 
the many stochastic factors involved in evolution and the sampling pro- 
cess leading to the data analyzed. Our focus in this chapter is on these 
stochastic factors. 

Some of the algorithmic processes employed for tree estimation are based 
on the concept of a “genetic distance” between two populations. The recog- 
nition of the stochastic nature of evolution implies that the construction 
of such a distance is not straightforward. This matter is discussed further 
below. 

We shall describe the effectively instantaneous change in frequency of a 
nucleotide from a value close to 0 to a value close to 1 as the substitution 
of one nucleotide by another, meaning more precisely the substitution of 
the predominant nucleotide by another in the population of interest. The 
time unit chosen to evaluate the properties of this substitution process is 
arbitrary, but is often large, perhaps on the order of hundreds of thou- 
sands of generations. Our initial analysis focuses on just one nucleotide 
site, and in particular on the nucleotide at this site that is predominant in 
the population of interest. 

The analysis uses the theory of finite Markov chains, outlined above 
in Section 2.12, described in that section in terms of abstract “states” 
Ei, £2, £3, . • . , E s . In our case s = 4, and the states E\ , E 2 , E 3 , £4 are 
identified with the events that in the population of interest, the predomi- 
nant nucleotide at the site in question is a, g, c, and t, respectively. Thus 
the Markov chain process is, for example, in state E 2 at some given time 
if in the population considered, the predominant nucleotide at the site of 
interest is g at that time. If unit time in the Markov chain is taken as, for ex- 
ample, 500,000 generations, a change from state £3 to state E\ in one time 
unit means the substitution of the nucleotide c by the nucleotide a after a 
period of 500,000 generations. If unit time is taken as, for example, 500,000 
generations, a period of time n implies a period of 500,000n generations. 
During this time it is possible that for certain time periods various other 
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states were occupied, that is, that other nucleotides were predominant in 
the population. 

The estimation of a phylogenetic tree is based on the comparison of ge- 
netic data from a number of contemporary populations. Various statistical 
procedures used in this comparison might require tracing up the tree of 
evolution from one population to a common ancestor and then down the 
tree to another population. If the stochastic process assumed for tracing up- 
ward is to be the same as that for tracing downward, the stochastic process 
describing the genetic evolution within each population must be reversible. 
We therefore start by discussing the reversibility criterion in the context of 
evolutionary models, focusing on DNA substitutions and the 4x4 Markov 
chain transition matrices used to describe these substitutions. 

12.1.1 The Reversibility Criterion 

The criterion of reversibility of a Markov chain was discussed in Section 
2.12. Reversibility applies only to Markov chains with a stationary distri- 
bution, and the criterion that a Markov chain with stationary distribution 
4> be reversible is given in (2.164). 

An arbitrary 4x4 transition matrix has twelve free parameters, namely 
three free transition probabilities in each of the four rows of the transition 
matrix. (The fourth transition probability in each row is determined by the 
remaining three.) However, another parameterization, using a different set 
of the twelve free parameters, is more useful for us in investigating the re- 
versibility requirement. This parameterization was given by Tavare (1986), 
and under this parameterization the 4x4 transition matrix is written in 
the form 

1 — uW uA(j) 2 uB(j) 3 uC(f) 4 
uD(f ) i 1 — uX uE(j)s uFcj ) 4 
uGcj ) i uH(j > 2 1 — uY ul(j ) 4 

uJ(j) 1 uK<p 2 uL(f) 3 1 — uZ 

Here A, H, . . . , L are the twelve free parameters, (</> i, <j> 2, 4> 3, <£4) is the 
stationary distribution of the Markov chain, and 

W = A(j ) 2 + Bcj ) 3 + C7</>4, X = Dcj) 1 -j- E(j>3 + Fcj) 4, 

Y = + H<fa + ^4, Z=J(j) 1 + K<h + Lfa. 

The necessary and sufficient condition for the Markov chain with tran- 
sition matrix written in the form (12.1) to be reversible is that the 
equations 

A = D, B = G , C = J, E = H, F = K , I = L (12.2) 

all be satisfied. When this requirement is satisfied, the model has six free 
parameters, which can be taken as A, H, C, E, F, and 7, so one can think of 



( 12 . 1 ) 
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paying for reversibility by losing the choice of six of the twelve parameters 
in the 4x4 transition matrix. 

It is easily checked that when the conditions (12.2) hold, (0i, 0 2, 0 3, 0 4) 
is indeed the stationary distribution of the model (12.1). 



12.2 Various Evolutionary Models 

12.2.1 The Jukes-Cantor Model 

The simplest (and earliest) model of nucleotide substitution is the Jukes- 
Cantor model (Jukes and Cantor (1969)). Using the convention of the states 
of a Markov chain given in Section 12.1, the transition matrix P for this 
model is given by 



1 — 3a 


a 


a a 


a 1 


— 3a 


a a 


a 


a 1 


Q 

S 

CO 

1 


a 


a 


a 1 — 3a 



Thus in the Jukes-Cantor model it is assumed that whatever the nucleotide 
in the population is at any time, the three other nucleotides are equally 
likely to substitute for it. The model therefore possesses an unrealistic 
assumption of symmetry, and thus may not reasonably be used as an accu- 
rate evolutionary model. We discuss more realistic models below. However, 
several formulas used in phylogeny theory are based on the Jukes-Cantor 
model, often without explicit recognition of this fact, so we now discuss the 
properties of this model in perhaps greater detail than its intrinsic value 
warrants. 

In the model (12.3) a is a parameter depending on the time scale chosen: 
If unit time were chosen as 500,000 generations, a would take a value 
smaller than it would if unit time were chosen as 1,000,000 generations. 
Whatever time scale is chosen, it is clearly necessary that a be less than 

Elementary Markov chain theory shows that the stationary distribution 
0 = (0i, 0 2 , 03) 04 ) ' for this model, defined in (2.157), is the uniform 
distribution 



(01, <f> 2) 03, 04 y = (.25, .25, .25, .25)', (12.4) 

as might be expected from the symmetry of the model. The results of 
Section 12.1.1 then show that this model is reversible. 

It is straightforward to show for this model that whatever the predomi- 
nant nucleotide in the population is at time 0, the probability that this is 
also the predominant nucleotide at time n is 

4 + 4 (! - 4a ) . 



( 12 . 5 ) 
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and the probability that some other specified nucleotide is the predominant 
nucleotide at time n is 



1 

4 



J (l-4or)“. 



(12.6) 



12.2.2 The Kimura Model and Its Generalizations 

The highly symmetric assumptions implicit in the Jukes-Cantor model 
are not realistic. A transition , that is, the replacement of one purine by 
the other (for example, of a by g) or of one pyrimidine by the other, is 
in practice more likely than a transversion , that is, the replacement of 
a purine by a pyrimidine or of a pyrimidine by a purine. Kimura (1980) 
proposed a (continuous- time) two-parameter model to allow for this. The 
transition matrix P for the discrete-time version of this model, with the 
ordering of states given in Section 12.1, is 

1 — a — 2(3 a (3 (3 

a l — a — 2(3 (3 (3 , . 

0 0 l-a-20 a ' y > 

(3 f3 a 1 — a — 2(3 

Here a is the probability of a transition in one time unit, while (3 is the 
probability that a purine is substituted by a nominated pyrimidine in one 
time unit and is also the probability that a pyrimidine is substituted by a 
nominated purine in one time unit. It is, of course, required that a + 2(3 < 1. 

The stationary distribution for this model is easily shown to be the dis- 
crete uniform distribution given in (12.4), and from this the results of 
Section 12.1.1 show that the model is reversible. 

It can also be shown for this model that whatever the predominant 
nucleotide at time 0 at any site, the probability that this is also the 
predominant nucleotide at time n is 

1 + 1 + l (1-2 (a + /?))". (12.8) 

If the initial nucleotide is a purine, the probability that at time n the 
predominant nucleotide is the other purine is 

1 + 1 (1-4/9)" — 1 (l-2(a + /?))". (12.9) 

A parallel remark holds for pyrimidines. The probability that after n time 
units a purine has been substituted by a specific pyrimidine is 



( 12 . 10 ) 
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and the probability that it has been replaced by one or the other pyrimidine 
is 



(l-4/?) n . (12.11) 



A parallel remark holds for the replacement of a pyrimidine by a purine. 

Although the Kimura model is more realistic than the Jukes-Cantor 
model, it still does not provide a satisfactory evolutionary model. Apart 
from the assumptions implied in the form of the transition matrix P for 
the model, the uniform stationary distribution implied by the model is not 
realistic. There are various increasingly realistic, but at the same time in- 
creasingly complex, generalizations of this model in the literature, leading, 
for example, to models with nonuniform stationary distributions, but the 
increasing complexity implies that the criterion of reversibility for a com- 
plex model is less likely to hold. This means that in practice, a compromise 
must be reached, in the modeling process, between a simple model allowing 
a tractable analysis and satisfying the reversibility criterion, and a more 
realistic model that is difficult to analyze and might not be reversible. 

One model more complex than the Kimura model is that of Blaisdell 
(1985), which allows different within-transition and within-transversion 
rates. The transition matrix P for this model is 



1 — a — 27 a 7 7 

(3 1 — (3 — 27 7 7 

(5 5 1 — /? — 2 5 13 

S S a 1 — a — 25 



( 12 . 12 ) 



The stationary distribution of this Markov chain is found to be 



£(0+7) Vaf+7) 7(a+<S>) 7(0+5) V 

0 (a+/ 3+27) ’ 0 (a+/ 3+27) ’ 6>(a+0+2 S) ’ 0(a+0+2<5) ) ’ 



(12.13) 



where 6 = 7 + (5. This stationary distribution and the elements in the 
transition matrix (12.12) show that this model is not reversible. On the 
other hand, the elements in the stationary distribution can now all be 
different from one another, a property not enjoyed by the Jukes-Cantor 
and Kimura models discussed above. 



12.2.3 The Felsenstein Models 

A form of generalization of the Jukes-Cantor model different from those 
considered above was introduced by Felsenstein (1981), whose notation we 
adopt here. In these models the probability of substitution of any nucleotide 
by another is proportional to the stationary probability of the substituting 
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nucleotide. This implies a transition matrix P of the form 



1 — U + U(j) i U(f) 2 

U(j) i 1 — U + U(f) 2 

U01 U(f) 2 

W01 U(f) 2 



U<p3 u<p 4 

U(j) 3 U(j) 4 

1 — U + u<ps ucj)^ 

U(j) 3 1 — U + U(f) 4 



(12.14) 



where 02, </>3, </>4) 7 is the stationary distribution and u is a parameter 
of the model. (It is easily checked that the stationary distribution for the 
model defined by (12.14) is indeed (<?h, </>4 )'.) 

A second Felsenstein model (Felsenstein and Churchill (1996); see also 
Kishino and Hasegawa (1989)) is more general than that given by (12.14), 
and is important because it is the evolutionary model used in the PHYLIP 
phylogenetic tree estimation package. This model has a transition matrix 
similar to that of (12.14), except that the upper-left 2x2 submatrix of the 
4x4 matrix in (12.14) is replaced by 



1 — Up U(j ) i — 

u<fi i -f- 



uK(f) 2 
01+02 
uK4>i 
01+02 



uK4> i 

<£l+</>2 , 



1 - U + U(j ) 2 



(12.15) 



and the lower-right 2x2 submatrix of the 4x4 matrix in (12.14) is replaced 
by 



1 - u + u<t > 3 
u 4> 3 + 



uK(f) 4 

03+^4 
uK4> 3 



1 - 9* 4- 7/0), - H/L4> 3 - 



(12.16) 



The transition matrix defined jointly by (12.14), (12.15) and (12.16), as 
with the simpler model (12.14), has stationary distribution (</>i, 02, 03, <£4). 
From this it is easily shown that the model is reversible. The quantity 
K is positive and is a further parameter of the model: Larger values of 
K increase transition substitution rates compared to those in the model 
(12.14). 

Although the model (12.14) generalizes the Jukes-Cantor model, to 
which it reduces if (f)\ = <p 2 — </>3 = </>4 = 1/4, it does not generalize 
the Kimura two-parameter model (12.7). On the other hand, the model 
defined jointly by (12.14), (12.15), and (12.16) does generalize the Kimura 
two- parameter model, reducing to that model when the stationary distribu- 
tion is uniform. (This requires the identifications of the parameters a and 
/3 in the Kimura model with u(2K + 1)/4 and u/ 4, respectively.) It also, of 
course, generalizes the model (12.14), to which it reduces when K — 0. 

A model rather similar to that defined jointly by (12.14), (12.15), and 
(12.16) was introduced by Hasegawa et al. (1985). In this model the 
transition probability matrix P is of the form 



U(j> 2 - V(f>A 


U(f) 2 


V<f> 3 


V<(> 4 


U(j) 1 1 


- U(j) 1 - V(pA 


V<t> 3 


V4>i 


v4> 1 


V(f>2 1 


— U(j) 4 — V(f>B 


U(j) 4 


v<t> 1 


V<t>2 


U(j>2 1 


-Ufa - 



(12.17) 
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where </)a = 0 C + 0t, 4>b — <fia + 4>g- This model is an amalgam of the 
Kimura model (12.7) and the simpler Felsenstein model (12.14), and in- 
cludes these as particular cases. The stationary distribution in this model 
is (0i, 02, 03, 04)', as the notation anticipates, and this model is also 
reversible. 



12.3 Some Implications 



12.3.1 Introduction 



In this section we consider some of the immediate implications of the calcu- 
lations given in the previous section for the Jukes-Cantor and the Kimura 
models. 

Before doing so, we observe that most calculations used for phyloge- 
netic estimation use continuous- time Markov processes rather than the 
discrete-time Markov chains considered above. We therefore list here the 
continuous-time analogues of relevant calculations for the Jukes-Cantor 
and the Kimura models, and use these in the discussion in this section. 
Specifically, the continuous-time analogues of the Jukes-Cantor model 
expressions in (12.5) and (12.6) are 

^ = \ + l e ~ 4at and <& = \-\ e ~ Aat ' ( 12 - 18 ) 

respectively, and the continuous- time analogues of the Kimura model 
expressions (12.8), (12.9), and (12.11) are 



1 1 



— + Z p~ 2(«+W 






l 



l l 



<74 



0 -m_l p -2(a+(3)t 

4 ' 4 ~ 2 



4-- (12.19) 



and 



respectively. 




( 12 . 20 ) 



12.3.2 The Jukes-Cantor Model 



We start by by considering various implications of the Jukes-Cantor ex- 
pressions (12.18). Suppose that two populations split at time t in the past. 
Then the same nucleotide type arises at a given site in both contemporary 
populations if they are both copies of the ancestral nucleotide at the time of 
the split (probability q\) or if they are both copies of some other nucleotide 
(probability 3g|)- The total probability r\ is then given by 



n=«? + 3 = j + f «-*"■ 



( 12 . 21 ) 
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This is the analogue of q \ . Similarly, the analogue of q 2 is 

r 2 = ] — 1 e~ Sat . (12.22) 

4 4 

These results can be found immediately by reversibility, considering the 
stochastic process going up one line of ascent from one of the two popu- 
lations to the common ancestor, and then down the other line of descent 
to the other population. Thus the probabilities rq and rq can be found 
from q\ and respectively, simply by replacing t by 2t, and this gives the 
expressions in (12.21) and (12.22). 

We write p = 1 — rq = 3r2 as the probability that the two nucleotides 
differ, so that 

p= ^ (1 -e~ 8at ). (12.23) 

From this, 

at = -1 log (l - 1 p) . (12.24) 



The probability p can be estimated unbiasedly by the proportion p of nu- 
cleotide sites at which the two populations being compared differ in their 
respective homologous DNA sequences sampled. Common practice is, then, 
to estimate at by at, defined by 

at = -1 log (l - Ijo) . (12.25) 

If an extrinsic estimate of a is available, this gives an estimate of the time 
t since the initial split of the two populations. 

This procedure gives a biased estimator of t, and indeed, the estimator 
is not even defined if p > 3/4. It also depends on the unrealistic Jukes- 
Cantor model. Despite this, this estimator appears, often uncritically, in 
the literature. Since this time t is a critical feature of a phylogenetic tree 
estimation, we can then expect biased estimation of phylogenetic trees if 
the estimator, or its generalizations when many populations are considered, 
is used. 

The estimator (12.25) does have one interesting property. Write A = 3a 
as the “total” substitution rate and suppose that the estimator p = D/N 
is derived from the comparison of N nucleotide sites in one population and 
the corresponding N sites in the other population, where D is the number 
of sites for which the two sequences compared differ. Then the total mean 
number of substitutions down both lines of descent from the initial ancestor 
population is v — 2NXt = 6Nat. This would then be estimated, from 
(12.25), by 

* = -T log (' - |p) • 



(12.26) 
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If p is small, a Taylor series approximation gives 

2 2D 2 

0n N p+- N p 2 = D + w . (12.27) 

This implies that the estimated number of substitutions is somewhat larger 
than the observed number, the difference arising from the estimated number 
of sites at which either the same substitution arose in the two populations 
together with the estimated number of sites, or at which two or more sub- 
stitutions occurred down one or other line of descent, concluding with the 
same nucleotide. If, for example, N = 3, 000, D — 300, the approximation 
(12.27) leads to an estimated total of 320 substitutions, 20 of which are 
estimated not to be observed in the contemporary sample. 

As remarked above, several tree estimation algorithms depend on the 
use of some measure of genetic distance between two populations. The 
argument given above implies that v forms a better measure of genetic 
distance between the two populations than does the count D of sites at 
which different nucleotides are observed in the two populations. 

There are many further implicit assumptions made when a is estimated 
from data from many sites. One of these is that the value of a is the 
same at all sites. This assumption is undoubtedly untrue, and with site to 
site variation in the value of a , (12.25) gives an underestimate of A t (Nei 
(1975, pp. 225-226)), and thus of the genetic distance between the two 
populations. 

The concept of a distance as just described assumes genetic uniformity 
within populations. A more general definition of distance allows for nu- 
cleotide segregation within populations. Here various measures (Sokal and 
Sneath (1963), Rogers (1972), Hedrick (1971), Cavalli-Sforza and Edwards 
(1967), and Nei (1972)) have been proposed for various purposes. For evo- 
lutionary considerations we require a measure that, if substitutions occur 
at a constant rate, is proportional to the time t between the splitting of the 
two populations considered. Nei (1976) showed by computer simulation 
in the infinitely many alleles case that the expected value of his genetic 
distance measure Dn increases almost linearly with time, and that this 
property is not shared by the other distance measures above. The infinitely 
many alleles model is not appropriate for nucleotide sequence data, so it 
does not necessarily follow that D n has this linearity property for these 
data. It is therefore useful to examine properties of Dn for the infinitely 
many sites model. We now do this, assuming that the Jukes-Cantor model 
holds. 

We consider two populations that split at time t in the past. Suppose that 
at a given site, the frequencies of the four nucleotides in one population 
are given as aq, aq, £3, and £4. Suppose that y \ , y 2 , 2/3, and y 4 are the 
corresponding (random) frequencies in the other population. For this case 
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Dm is defined as 



D n = - log [ .. ■■ ■- | , (12.28) 

VVEEt^EXl 

the three outer sums being taken over all sites considered. 

Now, E {yi | Xi ) = XiT\ + (1 — Xi)r 2 , so that from (12.21) and (12.22), 

E ( yi | = \{\ - e~ 8at ) + Xi e~ 8at . (12.29) 

From this, 

E X>^ = 1(1 - e- 8at ) +e- Sat E £)*?. (12-30) 

Since we assume essential homogeneity within any population, we make the 
approximations E J2 x i ~ 1* Inserting these approximate values 

in (12.28) and (12.30), and assuming a large number of nucleotide sites 
examined, so that random sampling effects can be ignored, we obtain 

^«-log(l + le- 8 ^). (12.31) 

(The right-hand side in this expression could also be obtained directly 
from (12.21).) If terms of order (at) 3 are ignored, Dm is approximately 
6at(l — at), and is thus essentially a linear function of t only when at < 0.1. 



12.3.3 The Kimura Model 



If the Kimura evolutionary model is assumed, a set of calculations similar 
to those leading to (12.26) and (12.27) can be made. In this case both the 
parameters at and fit are estimated, the data used being the numbers of 
transitional and transversional differences observed in the data. Suppose 
that at any time the predominant nucleotide is a specified purine (respec- 
tively pyrimidine). We must first find the probability p\ that at time t later 
the predominant nucleotide is the other purine (respectively pyrimidine). 
This probability is given by 



_ I i i p -4/3t _ I -2{a+{3)t 

4 4 2 



(12.32) 



If at any time the predominant nucleotide is a specified purine (respectively 
pyrimidine), then the probability P 2 that at time t later the predominant 
nucleotide is one or other pyrimidine (respectively purine) is 



P2 = g “ 2 e ' 4/3t * (12 ’ 33) 

These probabilities may be estimated by the respective sample proportions 
— m/N and p 2 = n^/iV, where in a sample of N sites there are n\ 
sites at which one purine (pyrimidine) arises at some site in the sequence 
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of one population and the other pyrimidine (purine) arises at that site in 
the sequence of the other population, and ri 2 sites at which a purine occurs 
in one sequence and a pyrimidine in the other. 

From these equations at and fit may be estimated by solving the 
equations 



and 



1 1 

*>> = 4 + 4' 



-4 (3t 



1-2 (at+fh) 

2 



(12.34) 




-4 0t 



(12.35) 



These estimation procedures are subject to the same qualifications as were 
made for the parallel Jukes-Cantor model estimation procedure. 

The mean rate at which substitutions of one form or another arise is 
a + 2/3, and from this it is found that the mean number of substitutions 
in the evolution of the two populations since their common ancestor is 
v — 2N (a + 2 (3)t. This can be estimated from the estimates of at and j3t 
implicit in (12.34) and (12.35). The result is that the estimate of v is 



V = ljVlog(l - 2pi - p 2 ) + liVlog(l - 2 p 2 ). (12.36) 

Suppose that in a sample of N = 3000 sites there are 210 transitional and 
90 transversional differences. Then pi = 0.07 and p 2 = 0.03. Then (12.36) 
leads to an estimate of v of 326. This exceeds the observed number 300 of 
nucleotide differences, indicating that it is estimated that 26 substitutions 
are estimated not to be “observed” in the contemporary sample. 

More important, it differs from the estimated value of 320 that would 
arise if the Jukes-Cantor estimation procedure, which does not distinguish 
between transitional and transversional substitutions, had been used. This 
implies that if the Kimura model faithfully describes the evolutionary pro- 
cess, a bias will arise in the estimation of the genetic distance between the 
two populations if the simple Jukes-Cantor model is used for the distance 
estimation procedure. In practice the Kimura model is itself over-simplified, 
and even greater biases may be expected if the Jukes-Cantor model is used 
when a far more complex model is appropriate. This is in addition to the 
bias inherent in the Jukes-Cantor model itself, described below (12.25). The 
same conclusion applies for any comparatively simple evolutionary model. 
Thus phylogenetic trees estimated from simple stochastic models must be 
viewed with much caution. 



12.4 Statistical Procedures 

The fact that one is only estimating a phylogenetic tree from contemporary 
data, rather than constructing it without error, implies that many statisti- 
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cal issues arise, mainly questions of hypothesis testing. One frequently used 
test statistic is the well-known —2 log A statistic, where A is a ratio of (max- 
imum) likelihoods, one calculated under the null hypothesis and one under 
the alternative hypothesis. When various criteria are satisfied, —2 log A has 
an approximate chi-square distribution when the null hypothesis is true. 

One of these criteria is that the null hypothesis be a particular case of 
the alternative hypothesis. Another is that the hypothesis testing procedure 
must relate to the value (or values) of some parameter (or parameters) that 
can take continuous real number values only. There are further criteria also, 
but we do not discuss them here. 

One procedure that is frequently carried out is to test whether some 
more complex evolutionary model explains the data better than a simpler 
evolutionary model. Although neither model can be accepted as giving 
a reasonable description of evolution, we illustrate the problems involved 
with such a procedure by discussing the test of whether the Kimura two- 
parameter model explains the data better than the Jukes-Cantor model. 
In statistical terms, this is a test of the null hypothesis that the parameter 
(3 in (12.7) is equal to the parameter a against the alternative that allows 
the two parameters to take any values. 

There are two aspects of this test that deserve discussion. The first is 
that the null hypothesis (that the Jukes-Cantor model holds) is a particular 
case of, or is “nested within” , the alternative hypothesis (that the Kimura 
two-parameter model holds). Thus the first criterion listed above for the 
use of the —2 log A approach is satisfied. The second also appears, at first 
sight, to be satisfied, but if the topology of the phylogenetic tree, which can 
loosely be thought of as a parameter, is estimated in the procedure, then 
the second criterion is not satisfied, since the shape of the phylogenetic tree 
is not a real number. 

Even if the shape of the phylogenetic tree is given a priori, this last 
problem still arises. Part of the estimation procedure is to estimate various 
DNA sequences at the internal nodes of the phylogenetic tree, and these 
sequences are not real numbers. 

As another problem, suppose that the null hypothesis is the Jukes- 
Cantor model and the alternative hypothesis is the simple Felsenstein model 
(12.14) with stationary probability values equal to the observed values in 
the data. Then neither model is nested within the other and there is no 
theoretical support for the claim that the null hypothesis distribution of 
—2 log A is chi-square. Whelan and Goldman (1999) show that the null hy- 
pothesis distribution of —2 log A is indeed not close to a chi-square in this 
case, and that in fact, negative values of —2 log A can arise, an impossibility 
for a random variable truly having a chi-square distribution. 

A third problem concerns testing for a monophyletic group, or clade. A 
collection of species derived from some internal node in the phylogenetic 
tree is called a monophyletic group if no other species descends from this 
node. It is often of interest to test whether some group of species of special 
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interest forms a monophyletic group. Here the null hypothesis is that this 
collection of species does indeed form a monophyletic group. The maximum 
of the likelihood of the data under this hypothesis can in principle be formed 
as well as the maximum of the likelihood of the data when no monophyly 
claim is made. A value of —2 log A can then, in principle, be calculated, but 
this does not have a chi-square distribution under the null hypothesis, since 
the shape of the phylogenetic tree is estimated as part of both likelihood 
procedures. 

Apart from these and other statistical issues, there are very difficult 
problems of computing to be overcome when likelihood calculations are 
made. We do not discuss these, or further statistical problems, here: An 
excellent summary of these matters is given by Goldman, Anderson and 
Rodrigo (2000). 




Appendix A 

Eigenvalue Calculations 



Let X t (t = 0,1,2,...) be a (possibly vector) Markovian random vari- 
ablewith state space {0, 1, . . . , M} and transition matrix P. Suppose that 
Poo = Vmm — 1? that the states {1,2,...,M — 1} are transient, and that 
there exists an integer m such that >0forl<i<M — 1 and all j. 

Suppose further that a function f{X) exists such that /( 0) — f{M) = 0, 
/(*) > 0 otherwise, and for which 

E{/(X t+1 ) | X t } = X 2 f(X t ) (A.l) 

for some constant A 2 . Then A 2 is real and positive and is the leading nonunit 
eigenvalue of P. 

The proof is almost immediate. The matrix P has two unit eigenval- 
ues, and if the first and last rows and columns of P are removed, the 
remaining eigenvalues of P are those of the resultant matrix Q. Denot- 
ing (/(l), . . • , f(M — 1)) by f', we see that (A.l) and the assumption that 
/( 0) = f(M) = 0 show that 

Qf = A 2 f . 

Since the matrix Q satisfies the conditions of Theorem 2.2 of Karlin and 
Taylor (1975, p. 545), the Frobenius theory of their Theorem 2.1 proves the 
desired result. 




Appendix B 

/\ 

Significance Levels for F 



Empirical 1 significance levels (2.5%, 5%, 97.5%) of the test statistic F for 
given values of k and n. “N.S.” means significance is not possible at the 
probability level indicated. 



n Prob k 







3 


5 


7 


10 


15 


20 


25 


30 


100 


2.5% 


0.36 


0.27 


0.20 


0.15 


0.11 


0.08 


0.06 


0.05 




5% 


0.40 


0.29 


0.21 


0.16 


0.11 


0.08 


0.07 


0.05 




97.5% 


N.S. 


0.87 


0.71 


0.48 


0.33 


0.22 


0.15 


0.12 


200 


2.5% 


0.37 


0.28 


0.22 


0.17 


0.12 


0.09 


0.08 


0.06 




5% 


0.41 


0.30 


0.23 


0.18 


0.13 


0.10 


0.08 


0.07 




97.5% 


N.S. 


0.89 


0.78 


0.63 


0.41 


0.29 


0.23 


0.17 


300 


2.5% 


0.38 


0.29 


0.23 


0.17 


0.12 


0.10 


0.08 


0.07 




5% 


0.43 


0.31 


0.24 


0.19 


0.13 


0.11 


0.08 


0.07 




97.5% 


N.S. 


0.93 


0.83 


0.68 


0.48 


0.34 


0.26 


0.20 


400 


2.5% 


0.41 


0.29 


0.23 


0.17 


0.13 


0.10 


0.08 


0.07 




5% 


0.45 


0.31 


0.25 


0.19 


0.14 


0.11 


0.09 


0.08 




97.5% 


0.99 


0.93 


0.86 


0.71 


0.51 


0.35 


0.28 


0.21 


500 


2.5% 


0.40 


0.28 


0.24 


0.18 


0.13 


0.11 


0.09 


0.07 




5% 


0.45 


0.31 


0.25 


0.20 


0.15 


0.11 


0.09 


0.08 




97.5% 


0.99 


0.93 


0.86 


0.74 


0.52 


0.41 


0.31 


0.24 



1 Based on 1,000 independent drawings for each ( k,n ) combination from the 
distribution (9.30). Values by kind courtesy of R. Anderson. 




Appendix C 

Means and Variances of F 



Values of E(F | k) and var(F [ k) for various fc, n values. Values by kind 
courtesy of R. Anderson. 











k 












3 


5 


7 


10 


15 


20 


25 


30 


100 


0.671 


0.490 


0.376 


E (F 
0.271 


1 k) 
0.176 


0.125 


0.094 


0.073 


200 


0.705 


0.532 


0.421 


0.313 


0.212 


0.156 


0.120 


0.096 


n 300 


0.722 


0.554 


0.444 


0.336 


0.232 


0.173 


0.135 


0.110 


400 


0.732 


0.568 


0.459 


0.351 


0.245 


0.185 


0.146 


0.119 


500 


0.740 


0.579 


0.470 


0.362 


0.255 


0.193 


0.153 


0.126 


100 


0.0325 


0.0254 


0.0169 


var(F 

0.0089 


1 k) 

0.0033 


0.0013 


0.0006 


0.0003 


200 


0.0350 


0.0306 


0.0224 


0.0133 


0.0058 


0.0028 


0.0014 


0.0008 


n 300 


0.0359 


0.0331 


0.0253 


0.0159 


0.0075 


0.0038 


0.0021 


0.0012 


400 


0.0364 


0.0346 


0.0272 


0.0176 


0.0087 


0.0046 


0.0026 


0.0015 


500 


0.0366 


0.0356 


0.0286 


0.0190 


0.0096 


0.0052 


0.0030 


0.0018 
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